{"query": "How did they obtain the dataset?", "gt": "public resources where suspicious Twitter accounts were annotated, list with another 32 Twitter accounts from BIBREF19 that are considered trustworthy", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 20 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. The dataset was obtained by collecting tweets from a list of suspicious Twitter accounts annotated with fake news types and a list of trustworthy accounts.\n2. Duplicate, media-based, and link-only tweets were removed to ensure data quality.\n3. The maximum number of tweets allowed by the Twitter API was collected for each account.\n4. The dataset was built based on two lists annotated in previous works.\n5. The list of suspicious accounts was created using public resources.\n6. The list of trustworthy accounts was gathered from independent third parties.\n7. The dataset was built to detect non-factual Twitter accounts by treating post streams as a sequence of tweets' chunks.\n8. The tweets were sorted by posting date and split into $N$ chunks for each account.\n9. Each chunk consisted of a sorted sequence of tweets labeled by the label of its corresponding account.\n10. A set of features was extracted from each chunk and fed into a recurrent neural network to model the sequential flow of the chunks' tweets.\n11. An attention layer with dropout was used to attend over the most important tweets in each chunk.\n12. The representation was fed into a softmax layer to produce a probability distribution over the account types and thus predict the factuality of the accounts.\n13. The label for an account was obtained by taking the majority class of the account's chunks.\n14. The dataset was used to train and test the proposed approach for detecting non-factual Twitter accounts.\n15. The approach utilized a sequence of tweets belonging to one chunk to capture the sequential flow of tweets.\n16. The model learned the hidden state by capturing the previous timesteps (past tweets).\n17. The produced hidden state at each time step was passed to the attention layer which computed a 'context' vector as the weighted mean of the state sequence.\n18. The approach was compared to several baselines, including LR + Bag-of-words, Tweet2vec, LR + All Features (tweet-level), LR + All Features (chunk-level), and Top-$k$ replies, likes, or re-tweets.\n19. The approach was evaluated based on macro-F1", "evaluation": "f1"}
{"query": "What activation function do they use in their model?", "gt": "relu, selu, tanh", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 3 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Introduction\n2. Problem Statement\n3. Methodology\na. Input Representation\nb. Features\nc. Model\n4. Experiments and Results\na. Data\nb. Baselines\nc. Experimental Setup\nd. Results\n5. Conclusions\n6. Future Work\n7. References\n\nThe main problem addressed in this paper is the spreading of fake news and rumors on social media platforms, specifically on Twitter. The authors propose an approach to detect non-factual Twitter accounts by treating post streams as a sequence of tweets' chunks. They test several semantic and dictionary-based features together with a neural sequential approach and apply an ablation test to investigate their contribution. The proposed approach is compared against other approaches that discard the chronological order of the tweets or read the tweets individually. The results show that the proposed approach produces superior results at detecting non-factual accounts.\n\nThe methodology consists of three main parts: Input Representation, Features, and Model. The input representation is a sequence of chunks of tweets sorted by posting date, where each chunk consists of a sorted sequence of tweets labeled by the label of its corresponding account. The features extracted from each chunk include emotions, sentiment, morality, style, and words embeddings. The model is a recurrent neural network using long short-term memory (LSTM) to capture the sequential flow of the chunks' tweets. An attention layer with dropout is used to attend over the most important tweets in each chunk, and the representation is fed into a softmax layer to produce a probability distribution over the account types and thus predict the factuality of the accounts.\n\nThe authors use a dataset of Twitter accounts based on two lists annotated in previous works. For the non-factual accounts, they rely on a list of 180 Twitter accounts from a previous work annotated with the main fake news types. For the factual accounts, they use a list with another 32 Twitter accounts considered trustworthy by independent third parties. The dataset is preprocessed to remove duplicate, media-based, and link-only tweets, and for each account, the maximum amount of tweets allowed by Twitter API is collected.\n\nThe proposed approach is compared against several baselines, including a logistic regression (LR) classifier with a bag-of-words representation", "evaluation": "f1"}
{"query": "What baselines do they compare to?", "gt": "Top-$k$ replies, likes, or re-tweets, FacTweet (tweet-level), LR + All Features (chunk-level), LR + All Features (tweet-level), Tweet2vec, LR + Bag-of-words", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 21 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is the main problem addressed in the document?\nThe main problem addressed in the document is the detection of suspicious Twitter accounts that spread fake news.\n2. What is the hypothesis of the authors regarding suspicious accounts?\nThe hypothesis of the authors is that suspicious accounts have a unique pattern in posting tweet sequences, and their intentions to mislead are reflected in the way they transition from one set of tweets to the next.\n3. What is the proposed approach to detect non-factual Twitter accounts?\nThe proposed approach is to treat post streams as a sequence of tweets' chunks and test several semantic and dictionary-based features together with a neural sequential approach.\n4. What are the features used in the document?\nThe features used in the document include emotions, sentiment, morality, style, and words embeddings.\n5. What is the main contribution of the authors?\nThe main contribution of the authors is a proposed approach to detect non-factual Twitter accounts by treating post streams as a sequence of tweets' chunks and testing several semantic and dictionary-based features together with a neural sequential approach. The authors also benchmark their approach against other approaches that discard the chronological order of the tweets or read the tweets individually.\n6. What is the methodology used in the document?\nThe methodology used in the document is to read the tweets from the account's timeline, sort them by posting date, and split them into chunks. Then, a set of features is extracted from each chunk, and a recurrent neural network is used to model the sequential flow of the chunks' tweets. An attention layer with dropout is used to attend over the most important tweets in each chunk, and the representation is fed into a softmax layer to produce a probability distribution over the account types and predict the factuality of the accounts.\n7. What is the input representation used in the document?\nThe input representation used in the document is a Twitter account that contains m tweets. These tweets are sorted by date and split into a sequence of chunks, where each chunk contains s tweets. Each tweet in the chunk is represented by a vector in the real number field, which is the concatenation of a set of feature vectors. Each feature vector is built by counting the presence of tweet's words in a set of lexical lists. The final representation", "evaluation": "f1"}
{"query": "How are chunks defined?", "gt": "Chunks is group of tweets from single account that  is consecutive in time - idea is that this group can show secret intention of malicious accounts.", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 26 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is the main problem addressed in the document?\nThe main problem addressed in the document is the detection of suspicious Twitter accounts that spread fake news.\n2. What is the hypothesis of the authors regarding suspicious accounts?\nThe hypothesis of the authors is that suspicious accounts have a unique pattern in posting tweet sequences, and their intentions to mislead are reflected in the way they transition from one set of tweets to the next.\n3. What is the proposed approach to detect non-factual Twitter accounts?\nThe proposed approach is to treat post streams as a sequence of tweets' chunks and test several semantic and dictionary-based features together with a neural sequential approach.\n4. What are the features used in the document?\nThe features used in the document include emotions, sentiment, morality, style, and words embeddings.\n5. What is the main contribution of the authors?\nThe main contribution of the authors is the proposal of an approach to detect non-factual Twitter accounts by treating post streams as a sequence of tweets' chunks and testing several semantic and dictionary-based features together with a neural sequential approach.\n6. What is the methodology used in the document?\nThe methodology used in the document is to read the tweets from the account's timeline, sort them by posting date, and split them into chunks. Then, a set of features is extracted from each chunk and fed into a recurrent neural network to model the sequential flow of the chunks' tweets. An attention layer with dropout is used to attend over the most important tweets in each chunk, and the representation is fed into a softmax layer to produce a probability distribution over the account types and predict the factuality of the accounts.\n7. What is the input representation used in the document?\nThe input representation used in the document is a Twitter account that contains m tweets. These tweets are sorted by date and split into a sequence of chunks, where each chunk contains s tweets. Each tweet in the chunk is represented by a vector in the real number field, which is the concatenation of a set of feature vectors.\n8. What is the main finding of the document?\nThe main finding of the document is that the proposed approach produces superior results at detecting non-factual accounts compared to other approaches that discard the chronological order of the tweets or read", "evaluation": "f1"}
{"query": "What features are extracted?", "gt": "Sentiment, Morality, Style, Words embeddings", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 5 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Emotion: Word occurrences in emotional lexicons.\n2. Sentiment: Word sentiment using various lexicons.\n3. Morality: Word labels based on morality foundation theory.\n4. Style: Stylistic features like uppercase ratio, tweet length.\n5. Words embeddings: Word embeddings using pretrained models.", "evaluation": "f1"}
{"query": "What features are extracted?", "gt": "15 emotion types, sentiment classes, positive and negative, care, harm, fairness, cheating, loyalty, betrayal, authority, subversion, sanctity, and degradation, count of question marks, exclamation marks, consecutive characters and letters, links, hashtags, users' mentions, uppercase ratio, tweet length, words embeddings", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 39 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Semantic features: word embeddings, sentiment, emotion, morality, style.\n2. Dictionary-based features: lexical lists, canonical stylistic features.\n3. Neural sequential approach: recurrent neural network with attention layer and dropout.", "evaluation": "f1"}
{"query": "Was the approach used in this work to detect fake news fully supervised?", "gt": "Yes", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Yes", "evaluation": "f1"}
{"query": "Based on this paper, what is the more predictive set of features to detect fake news?", "gt": "words embeddings, style, and morality features", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 6 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Methodology: The methodology of the paper involves collecting a dataset of tweets from Twitter accounts that contain propaganda, hoax, and clickbait. The tweets are then sorted by posting date and split into chunks of a fixed size. The chunks are then labeled with the label of their corresponding account. The authors then extract a set of features from each chunk and feed them into a recurrent neural network to model the sequential flow of the chunks' tweets. The output of the model is a probability distribution over the account types, which is used to predict the factuality of the accounts.\n2. Features: The authors extract a set of features from each chunk of tweets, including sentiment, emotion, morality, style, and word embeddings. The sentiment of the tweets is extracted using various lexicons, including EffectWordNet, SenticNet, NRC, and subj\\_lexicon. The emotion of the tweets is extracted using two emotional lexicons, NRC and SentiSense. The morality of the tweets is extracted using a theory of morality foundation. The style of the tweets is extracted using canonical stylistic features, such as the count of question marks, exclamation marks, consecutive characters and letters, links, hashtags, users' mentions, uppercase ratio, and tweet length. The word embeddings of the tweets are extracted using the GloVe-840B-300d pretrained model.\n3. Results: The authors compare their approach to several baselines, including a logistic regression classifier with a bag-of-words representation, a bidirectional gated recurrent neural network (Tweet2vec), and a logistic regression classifier with all features at the tweet and chunk level. The authors find that their approach, which models the sequential flow of tweets in chunks, produces superior results at detecting non-factual accounts.\n4. Conclusion: The authors conclude that their approach, which models the sequential flow of tweets in chunks, is effective at detecting non-factual Twitter accounts. They suggest that analyzing tweet streams into chunks and incorporating heterogeneous knowledge sources, such as lexica and text, can improve the detection of fake news. The authors also suggest that future work could incorporate explicit temporal information and develop a multilingual version", "evaluation": "f1"}
{"query": "How big is the dataset used in this work?", "gt": "Total dataset size: 171 account (522967 tweets)", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 7 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "180 non-factual and 32 factual accounts.", "evaluation": "f1"}
{"query": "How is a \"chunk of posts\" defined in this work?", "gt": "chunk consists of a sorted sequence of tweets labeled by the label of its corresponding account", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 16 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is the main problem addressed in the document?\nThe main problem addressed in the document is the detection of suspicious Twitter accounts that spread fake news.\n2. What is the hypothesis of the authors regarding suspicious accounts?\nThe hypothesis of the authors is that suspicious accounts have a unique pattern in posting tweet sequences, and their intentions to mislead are reflected in the way they transition from one set of tweets to the next.\n3. What is the proposed approach to detect non-factual Twitter accounts?\nThe proposed approach is to treat post streams as a sequence of tweets' chunks and test several semantic and dictionary-based features together with a neural sequential approach.\n4. What are the features used in the model?\nThe features used in the model include emotions, sentiment, morality, style, and words embeddings.\n5. What is the role of the attention layer in the model?\nThe attention layer in the model computes a context vector by weighting the previous timesteps (past tweets) based on their importance.\n6. What is the main contribution of the authors?\nThe main contribution of the authors is a novel approach to detect factuality at the account level from a textual perspective.\n7. What is the input representation of a Twitter account in the model?\nThe input representation of a Twitter account in the model is a sequence of chunks, where each chunk consists of a sorted sequence of tweets labeled by the label of its corresponding account.\n8. What is the purpose of the ablation test in the model?\nThe purpose of the ablation test is to investigate the contribution of different features in the model.\n9. What is the role of chronological order in the model?\nThe role of chronological order in the model is to preserve the sequence of tweets in each chunk.\n10. What is the output of the model?\nThe output of the model is a probability distribution over the account types and thus predicts the factuality of the accounts.\n11. What is the significance of the majority class label for an account?\nThe majority class label for an account is obtained by taking the class of the account's chunks and is significant for predicting the factuality of the account.\n12. What is the role of lexical lists in the model?\nThe role of lexical lists in the model is to", "evaluation": "f1"}
{"query": "What baselines were used in this work?", "gt": "LR + Bag-of-words, Tweet2vec, LR + All Features (tweet-level), LR + All Features (chunk-level), FacTweet (tweet-level), Top-$k$ replies, likes, or re-tweets", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 21 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. LR + Bag-of-words\n2. Tweet2vec\n3. LR + All Features (tweet-level)\n4. LR + All Features (chunk-level)\n5. Top-$k$ replies, likes, or re-tweets", "evaluation": "f1"}
{"query": "What models are used in the experiment?", "gt": "linear SVM, bidirectional Long Short-Term-Memory (BiLSTM), Convolutional Neural Network (CNN)", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 10 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Linear SVM\n2. Bidirectional LSTM\n3. Convolutional Neural Network (CNN)", "evaluation": "f1"}
{"query": "What are the differences between this dataset and pre-existing ones?", "gt": "no prior work has explored the target of the offensive language", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 11 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. The OLID dataset is novel in that it includes hierarchical annotation of offensive language, distinguishing between target and type of offense.\n2. Prior datasets have focused on specific aspects of offensive language, such as hate speech or profanity.\n3. OLID is the first to combine multiple sub-tasks in a single dataset, making it a valuable resource for researchers.\n4. It is also publicly available, promoting further study and development in the field of offensive language identification.", "evaluation": "f1"}
{"query": "In what language are the tweets?", "gt": "English", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. English", "evaluation": "f1"}
{"query": "What is the size of the new dataset?", "gt": "14,100 tweets", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 2 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "14,100 tweets.", "evaluation": "f1"}
{"query": "What is the size of the new dataset?", "gt": "Dataset contains total of 14100 annotations.", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 6 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "14,100 tweets in the OLID dataset.", "evaluation": "f1"}
{"query": "What kinds of offensive content are explored?", "gt": "non-targeted profanity and swearing, targeted insults such as cyberbullying, offensive content related to ethnicity, gender or sexual orientation, political affiliation, religious belief, and anything belonging to hate speech", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 28 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is the main problem addressed in the document?\nThe main problem addressed in the document is the pervasiveness of offensive content in social media and the need for computational methods to recognize and deal with this problem.\n2. What are the different aspects of offensive language that have been studied in the past few years?\nThe different aspects of offensive language that have been studied in the past few years include abusive language, cyber-aggression, cyber-bullying, toxic comments, hate speech, and offensive language. These aspects have been studied in platforms such as Twitter and Facebook.\n3. What is the hierarchical annotation model proposed in the document?\nThe hierarchical annotation model proposed in the document is a three-level model that encompasses offensive language detection (Level A), categorization of offensive language (Level B), and offensive language target identification (Level C).\n4. What is the OLID dataset?\nThe OLID dataset is a new large publicly available dataset of English tweets created using the hierarchical annotation model proposed in the document.\n5. What are the key contributions of the paper?\nThe key contributions of the paper are the proposal of a hierarchical three-level annotation model for offensive language identification, the creation of the OLID dataset, and the evaluation of different computational methods for identifying and categorizing offensive language in social media.\n6. What are the different sub-tasks of offensive language identification that have been explored in the past few years?\nThe different sub-tasks of offensive language identification that have been explored in the past few years include aggression identification, bullying detection, hate speech identification, toxic comments identification, and offensive language identification.\n7. What is the TRAC shared task on Aggression Identification?\nThe TRAC shared task on Aggression Identification is a task that provided participants with a dataset containing 15,000 annotated Facebook posts and comments in English and Hindi for training and validation. The task required systems to discriminate between three classes: non-aggressive, covertly aggressive, and overtly aggressive.\n8. What is the Toxic Comment Classification Challenge?\nThe Toxic Comment Classification Challenge is an open competition at Kaggle that provided participants with comments from Wikipedia labeled in six classes: toxic, severe to", "evaluation": "f1"}
{"query": "What is the best performing model?", "gt": "CNN ", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. The best performing model is the CNN.", "evaluation": "f1"}
{"query": "How many annotators participated?", "gt": "five annotators", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 2 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "6 annotators.", "evaluation": "f1"}
{"query": "What is the definition of offensive language?", "gt": " Most prior work focuses on a different aspect of offensive language such as abusive language BIBREF0 , BIBREF1 , (cyber-)aggression BIBREF2 , (cyber-)bullying BIBREF3 , BIBREF4 , toxic comments INLINEFORM0 , hate speech BIBREF5 , BIBREF6 , BIBREF7 , BIBREF8 , BIBREF9 , BIBREF10 , and offensive language BIBREF11 . Prior work has focused on these aspects of offensive language in Twitter BIBREF3 , BIBREF7 , BIBREF8 , BIBREF11 , Wikipedia comments, and Facebook posts BIBREF2 .", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 77 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Offensive language refers to any form of communication that is intended to insult, harm, or offend someone. It can take many forms, including verbal abuse, hate speech, and profanity.\n2. The use of offensive language can have serious consequences, including damage to personal relationships, emotional distress, and even physical harm.\n3. Offensive language is often used as a tool of oppression, used to marginalize and dehumanize certain groups of people.\n4. It is important to recognize and address offensive language when it occurs, both to protect the well-being of those who are targeted and to promote a more inclusive and respectful society.\n5. There are many strategies for addressing offensive language, including education, awareness-raising, and the development of policies and guidelines.\n6. Ultimately, the goal is to create a society in which everyone is treated with respect and dignity, and in which offensive language is no longer tolerated.", "evaluation": "f1"}
{"query": "What are the three layers of the annotation scheme?", "gt": "Level A: Offensive language Detection\n, Level B: Categorization of Offensive Language\n, Level C: Offensive Language Target Identification\n", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 19 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Offensive language detection (Level A): distinguishes between offensive and non-offensive tweets.\n2. Categorization of offensive language (Level B): categorizes the type of offense as targeted or untargeted.\n3. Offensive language target identification (Level C): categorizes the target of the offensive language as individual, group, or other.", "evaluation": "f1"}
{"query": "How long is the dataset for each step of hierarchy?", "gt": "Level A: 14100 Tweets\nLevel B: 4640 Tweets\nLevel C: 4089 Tweets", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 12 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "12 words: The length of the dataset for each step of the hierarchy is not specified in the document.", "evaluation": "f1"}
{"query": "What is the approach of previous work?", "gt": "'shallow' naive Bayes, SVM, hierarchical stacked classifiers, bidirectional recurrent neural networks", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 11 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is the main purpose of LID?\n2. What is the significance of LID in developing countries?\n3. What are the four conjunctively written Nguni languages?\n4. What are the three disjunctively written Sotho languages?\n5. What is the proposed LID algorithm in this paper?\n6. What is the purpose of the lexicon based classifier in the proposed algorithm?\n7. How does the lexicon based classifier trade off precision and recall?\n8. What are the three public LID implementations that the proposed algorithm is tested against?\n9. What are the four other algorithms that the proposed algorithm is tested against?\n10. What is the average classification accuracy of the proposed algorithm?\n11. How does the performance of the proposed algorithm depend on the support of the lexicon?", "evaluation": "f1"}
{"query": "Is the lexicon the same for all languages?", "gt": "Yes", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. No", "evaluation": "f1"}
{"query": "How do they obtain the lexicon?", "gt": "built over all the data and therefore includes the vocabulary from both the training and testing sets", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 17 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. The lexicon is built over all the data and includes vocabulary from both training and testing sets.", "evaluation": "f1"}
{"query": "What evaluation metric is used?", "gt": "average classification accuracy, execution performance", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 5 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is the main purpose of LID?\n2. What is the significance of LID in developing countries?\n3. What are the four conjunctively written Nguni languages?\n4. What are the three disjunctively written Sotho languages?\n5. What is the proposed LID algorithm in this paper?\n6. What is the significance of the lexicon in the proposed LID algorithm?\n7. What is the average classification accuracy of the proposed algorithm?\n8. How does the performance of the proposed algorithm depend on the lexicon?\n9. What are some of the research opportunities in LID for South African languages?\n10. What is the significance of data harvesting in LID?", "evaluation": "f1"}
{"query": "Which languages are similar to each other?", "gt": "Nguni languages (zul, xho, nbl, ssw), Sotho languages (nso, sot, tsn)", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 11 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "4 conjunctively written Nguni languages (zul, xho, nbl, ssw) are similar to each other. The same is true of the 3 disjunctively written Sotho languages (nso, sot, tsn) and tshiVenda (ven) and Xitsonga (tso).", "evaluation": "f1"}
{"query": "Which datasets are employed for South African languages LID?", "gt": "DSL 2015, DSL 2017, JW300 parallel corpus , NCHLT text corpora", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 11 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. NCHLT text corpora\n2. DSL 2015 & DSL 2017 shared tasks\n3. JW300 parallel corpus\n4. WiLI-2018 benchmark dataset\n5. NCHLT text corpora\n6. Native Language Identification (NLI)\n7. Language Variety Identification (LVI)\n8. Leipzig Corpora Collection\n9. Universal Declaration of Human Rights\n10. Tatoeba\n11. NCHLT text corpora", "evaluation": "f1"}
{"query": "Does the paper report the performance of a baseline model on South African languages LID?", "gt": "Yes", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Yes", "evaluation": "f1"}
{"query": "Does the algorithm improve on the state-of-the-art methods?", "gt": "Yes", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Yes", "evaluation": "f1"}
{"query": "Do they report results only on English data?", "gt": "Yes", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Yes", "evaluation": "f1"}
{"query": "Do they report results only on English data?", "gt": "The authors showed few tweets where neither and implicit hatred content exist but the model was able to discriminate", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 19 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Yes, the document focuses on hate speech detection in English.", "evaluation": "f1"}
{"query": "What evidence do the authors present that the model can capture some biases in data annotation and collection?", "gt": "Waseem-dataset, Davidson-dataset,", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 2 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Bias detection\n2. Manual inspection", "evaluation": "f1"}
{"query": "Which publicly available datasets are used?", "gt": "Waseem and Hovy BIBREF5, Davidson et al. BIBREF9, and Waseem et al. BIBREF10", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 13 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "2 publicly available datasets are used: Waseem and Hovy's dataset and Davidson et al.'s dataset.", "evaluation": "f1"}
{"query": "What baseline is used?", "gt": "BERT based fine-tuning, Insert nonlinear layers, Insert Bi-LSTM layer, Insert CNN layer", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 12 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is the main focus of the document?\nThe main focus of the document is to propose a transfer learning approach for hate speech understanding using the pre-trained language model BERT and to examine the effect of different embedding layers of BERT in hate speech detection.\n2. What is the problem of online abusive detection?\nThe problem of online abusive detection is that online platforms such as Twitter, Facebook, and YouTube are increasingly being exploited for the propagation of abusive language and hate-based activities. This has led to an increase in violence attributed to online hate speech, and governments and social network platforms must have tools to detect aggressive behavior in general and hate speech in particular.\n3. What is hate speech?\nHate speech is defined as any communication criticizing a person or a group based on some characteristics such as gender, sexual orientation, nationality, religion, race, etc. It is considered a broad umbrella term for numerous kinds of insulting user-generated content.\n4. What is the trend in detecting online hate speech?\nThe trend in detecting online hate speech is to use deep learning models such as Convolutional Neural Networks (CNNs) and Long Short-Term Memory Networks (LSTMs) to enhance the performance of hate speech detection models.\n5. What is the proposed transfer learning approach?\nThe proposed transfer learning approach is to use the pre-trained language model BERT learned on English Wikipedia and BookCorpus to enhance hate speech detection on publicly available benchmark datasets. The approach involves fine-tuning the BERT model on the downstream task by leveraging syntactical and contextual information of all BERT's transformers.\n6. What are the fine-tuning strategies proposed in the document?\nThe document proposes four different fine-tuning approaches that exploit pre-trained BERTbase transformer encoders for the hate speech detection task. These approaches include using the [CLS] token output provided by BERT, inserting nonlinear layers, inserting a Bi-LSTM layer, and inserting a CNN layer.\n7. What are the datasets used in the study?\nThe study evaluates the proposed method on two widely-studied datasets provided by Waseem and Hovey and Davidson et al. The datasets consist of tweets annotated with racism,", "evaluation": "f1"}
{"query": "What new fine-tuning methods are presented?", "gt": "sampling tweets from specific keywords create systematic and substancial racial biases in datasets", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 13 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "4 new fine-tuning methods are presented.", "evaluation": "f1"}
{"query": "What are the existing biases?", "gt": "Data annotation biases where tweet containing disrespectful words are annotated as hate or offensive without any presumption about the social context of tweeters", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 23 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. The paper proposes a transfer learning approach using the pre-trained language model BERT to enhance hate speech detection on publicly available benchmark datasets.\n2. The approach involves fine-tuning strategies to examine the effect of different embedding layers of BERT in hate speech detection.\n3. The experiment results show that using the pre-trained BERT model and fine-tuning it outperforms previous works in terms of precision, recall, and F1-score.\n4. The model can detect some biases in the process of collecting or annotating datasets, which can be valuable in debiasing hate speech datasets in future studies.", "evaluation": "f1"}
{"query": "What biases does their model capture?", "gt": "Waseem and Hovy BIBREF5, Davidson et al. BIBREF9, and Waseem et al. BIBREF10", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 13 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is the main focus of the document?\nThe main focus of the document is the problem of online hate speech detection and the proposed transfer learning approach using the pre-trained language model BERT to enhance the performance of hate speech detection on publicly available benchmark datasets.\n2. What is hate speech?\nHate speech is defined as any communication criticizing a person or a group based on some characteristics such as gender, sexual orientation, nationality, religion, race, etc.\n3. What is the transfer learning approach proposed in the document?\nThe proposed transfer learning approach is to use the pre-trained language model BERT learned on English Wikipedia and BookCorpus to enhance hate speech detection on publicly available benchmark datasets. The document introduces new fine-tuning strategies to examine the effect of different embedding layers of BERT in hate speech detection.\n4. What are the features used in traditional machine learning approaches for online hate speech and offensive language detection?\nThe features used in traditional machine learning approaches for online hate speech and offensive language detection are surface-level features such as bag of words, word-level and character-level $n$-grams, etc.\n5. What is the pre-trained language model used in the proposed approach?\nThe pre-trained language model used in the proposed approach is BERT (Bidirectional Encoder Representations from Transformers).\n6. What is the main advantage of using pre-trained embeddings in NLP tasks?\nThe main advantage of using pre-trained embeddings in NLP tasks is that they capture contextual information and can be fine-tuned for specific tasks using smaller amounts of labeled data.\n7. What is the fine-tuning strategy proposed in the document?\nThe fine-tuning strategy proposed in the document is to use different embedding layers of the pre-trained BERT model and fine-tune them using annotated datasets.\n8. What is the main challenge in detecting online hate speech?\nThe main challenge in detecting online hate speech is the lack of labeled data and the inability of surface features to capture the subtle semantics in text.\n9. What is the proposed solution to address the challenge of detecting online hate speech?\nThe proposed solution to address the challenge of detecting online hate speech is to use the pre-trained language model BERT for hate speech classification", "evaluation": "f1"}
{"query": "What does the \"sensitivity\" quantity denote?", "gt": "the number of distinct word recognition outputs that an attacker can induce", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 12 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is the main focus of the paper?\nThe paper focuses on adversarially-chosen spelling mistakes in the context of text classification and addresses attack types such as dropping, adding, and swapping internal characters within words.\n2. What are some examples of real-world problems that involve adversarial misspellings?\nExamples of real-world problems that involve adversarial misspellings include spam emails and programmatic censorship on the internet.\n3. What are some techniques that the paper evaluates for improving the robustness of text classification models against adversarial attacks?\nThe paper evaluates techniques such as data augmentation and adversarial training for improving the robustness of text classification models against adversarial attacks.\n4. What is the main contribution of the paper?\nThe main contribution of the paper is a task-agnostic defense that attaches a word recognition model to improve the robustness of text classification models against adversarial attacks.\n5. How does the proposed defense work?\nThe proposed defense attaches a word recognition model that predicts each word in a sentence given a full sequence of (possibly misspelled) inputs. The word recognition model's outputs form the input to a downstream classification model.\n6. What are some limitations of the proposed defense?\nThe paper acknowledges that the proposed defense may not be effective against all types of adversarial attacks and that there may be trade-offs between the accuracy of the word recognition model and its robustness.\n7. What is the sensitivity of a model?\nThe sensitivity of a model refers to the number of unique outputs it assigns to a set of adversarial perturbations. A high sensitivity indicates that the model is more vulnerable to adversarial attacks.\n8. How is the sensitivity of a model quantified?\nThe sensitivity of a model is quantified as the expected number of unique outputs it assigns to a set of adversarial perturbations.\n9. What is the main conclusion of the paper?\nThe main conclusion of the paper is that adversarial attacks can be highly effective against text classification models, even with small perturbations to the input text. The proposed defense can improve the robustness of these models, but there may be trade-offs between accuracy and robustness.\n10. What are some potential applications of the proposed defense?\nPotential applications of the proposed defense include improving the robust", "evaluation": "f1"}
{"query": "What end tasks do they evaluate on?", "gt": "Sentiment analysis and paraphrase detection under adversarial attacks", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 8 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is the main focus of the paper?\n2. What are the three types of adversarial attacks addressed in the paper?\n3. What is the purpose of the word recognition model in the proposed defense?\n4. What is the sensitivity of a word recognition system?\n5. What is the drawback of the evaluation setup used in the paper?\n6. What is the ScRNN model?\n7. What is the background model used in the paper?\n8. What is the main contribution of the paper?\n9. What is the metric used to quantify the sensitivity of word recognition models?\n10. What is the purpose of the pass-through backoff technique?\n11. What is the purpose of the neutral backoff technique?\n12. What is the purpose of the background model backoff technique?\n13. What is the main advantage of the proposed defense?\n14. What is the purpose of the empirical analysis in the paper?\n15. What is the purpose of the synthesizing adversarial attacks?\n16. What is the purpose of the experiments and results section?\n17. What is the purpose of the word error correction section?\n18. What is the purpose of the robust word recognition section?\n19. What is the purpose of the model sensitivity section?\n20. What is the purpose of the understanding model sensitivity section?", "evaluation": "f1"}
{"query": "What is a semicharacter architecture?", "gt": "A semi-character based RNN (ScRNN) treats the first and last characters individually, and is agnostic to the ordering of the internal characters", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 22 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Introduction\n2. Related Work\n3. Adversarial Attacks on Text Classification\n4. Experimental Setup\n5. Results\n6. Conclusion\n7. References", "evaluation": "f1"}
{"query": "Do they experiment with offering multiple candidate corrections and voting on the model output, since this seems highly likely to outperform a one-best correction?", "gt": "No", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "\n\nThe authors do not experiment with offering multiple candidate corrections and voting on the model output.", "evaluation": "f1"}
{"query": "Why is the adversarial setting appropriate for misspelling recognition?", "gt": "Adversarial misspellings are a real-world problem", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 6 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is the main focus of the paper?\n2. What are the three types of adversarial attacks addressed in the paper?\n3. What is the ScRNN model and how does it work?\n4. What is the drawback of the evaluation setup used in the paper?\n5. How does the proposed solution of placing a word recognition model before the downstream classifier offer benefits?\n6. What is the definition of sensitivity in the context of building robust word-recognizers?\n7. How is sensitivity quantified for a word recognition system?\n8. What is the synthesizing adversarial attacks method?\n9. What is the definition of robustness in the context of the paper?\n10. What is the main contribution of the paper?", "evaluation": "f1"}
{"query": "How do the backoff strategies work?", "gt": "Pass-through passes the possibly misspelled word as is, backoff to neutral word backs off to a word with similar distribution across classes and backoff to background model backs off to a more generic word recognition model trained with larger and less specialized corpus.", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 43 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Introduction\n* Discusses the problem of adversarial attacks on text classification models\n* Proposes a defense mechanism using a word recognition model\n* Describes experiments and results on various models and attack types\n* Related work section highlights previous research in the field\n2. Experimental Setup\n* Details on the datasets and models used in the experiments\n* Description of the word recognition model and its components\n* Explanation of the backoff strategies used in the defense mechanism\n3. Results\n* Presentation of the results of the experiments on various models and attack types\n* Comparison of the proposed defense mechanism with existing methods\n* Analysis of the effectiveness of the defense mechanism in different scenarios\n4. Conclusion\n* Summarizes the main findings of the study\n* Highlights the importance of addressing adversarial attacks in text classification\n* Suggests directions for future research in the field\n\nOverall, the paper presents a defense mechanism for text classification models against adversarial attacks. The proposed mechanism uses a word recognition model and various backoff strategies to improve the robustness of the model. The effectiveness of the defense mechanism is demonstrated through experiments on various models and attack types. The paper concludes by highlighting the importance of addressing adversarial attacks in text classification and suggesting directions for future research.", "evaluation": "f1"}
{"query": "What baseline model is used?", "gt": "same baseline as used by lang2011unsupervised", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 6 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is Semantic Role Labeling (SRL)?\n2. What is the problem of finding predicate-argument structure in a sentence?\n3. What are the labels A0 and A1 in SRL?\n4. What is the need for SRL in different domains and languages?\n5. What is unsupervised SRL?\n6. What is the standard unsupervised SRL setup?\n7. What is the Bayesian model of garg2012unsupervised used in the monolingual model?\n8. How is the global role ordering and repetition preferences modeled in the monolingual model?\n9. What is the Chinese Restaurant Process used for in the multilingual model?\n10. What is the first work to incorporate the coupling of aligned arguments directly in a Bayesian SRL model?", "evaluation": "f1"}
{"query": "Which additional latent variables are used in the model?", "gt": "CLV as a parent of the two corresponding role variables", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 10 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is Semantic Role Labeling (SRL)?\n2. What is the problem of finding predicate-argument structure in a sentence?\n3. What are the labels A0 and A1 in SRL?\n4. Why is there a need for unsupervised SRL in different domains and languages?\n5. What is the joint Bayesian model proposed in this paper for unsupervised semantic role induction in multiple languages?\n6. What is the role of crosslingual latent variables in the proposed model?\n7. How does the monolingual model used in this paper differ from previous models?\n8. How does the multilingual model use word alignments between sentences in a parallel corpus?\n9. What is the Chinese Restaurant Process used for in the model?\n10. How does the model perform in different settings involving labeled and unlabeled data?", "evaluation": "f1"}
{"query": "Which additional latent variables are used in the model?", "gt": "crosslingual latent variables", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 3 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Crosslingual latent variables\n2. Role-specific latent variables\n3. Predicate-specific latent variables", "evaluation": "f1"}
{"query": "Which parallel corpora are used?", "gt": "English (EN) and German (DE) sections of the CoNLL 2009 corpus BIBREF13, EN-DE section of the Europarl corpus BIBREF14", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 19 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is Semantic Role Labeling (SRL)?\n2. What is the problem of finding predicate-argument structure in a sentence?\n3. What are the labels A0 and A1 in SRL?\n4. Why is there a need for unsupervised SRL?\n5. What is the joint Bayesian model proposed in this paper?\n6. What is the role of crosslingual latent variables in the model?\n7. How does the multilingual model use word alignments between sentences?\n8. What is the Chinese Restaurant Process used for in the model?\n9. What is the first work to incorporate the coupling of aligned arguments directly in a Bayesian SRL model?\n10. How does the model scale gracefully to more than two languages?\n11. How can the model be used in various semi-supervised learning settings?\n12. What is the metric used for evaluation in this paper?\n13. What is the baseline used for evaluation?\n14. How does this work compare to the cross-lingual unsupervised SRL work of titovcrosslingual?\n15. What are the two main advantages of the monolingual model used in this paper over titovcrosslingual?\n16. What is the closest previous work to this paper?\n17. What corpora are used for the experiments in this paper?\n18. What are the results of the experiments in this paper?\n19. What is the motivation for jointly modeling SRL in multiple languages?", "evaluation": "f1"}
{"query": "Overall, does having parallel data improve semantic role induction across multiple languages?", "gt": "No", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Yes", "evaluation": "f1"}
{"query": "Do they add one latent variable for each language pair in their Bayesian model?", "gt": "Yes", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Yes", "evaluation": "f1"}
{"query": "What does an individual model consist of?", "gt": "Bayesian model of garg2012unsupervised as our base monolingual model", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 9 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. The paper proposes a joint Bayesian model for unsupervised semantic role induction in multiple languages.\n2. The model uses individual Bayesian models for each language and crosslingual latent variables.\n3. The latent variables capture soft role agreement between aligned constituents.\n4. The model is trained on parallel text in two or more languages.\n5. The model is evaluated on different settings involving labeled and unlabeled data.\n6. The model is shown to improve performance in unsupervised semantic role labeling.\n7. The model is applied to the task of unsupervised semantic role induction.\n8. The model is compared to previous work on unsupervised semantic role labeling.\n9. The model is shown to be effective in improving the accuracy of unsupervised semantic role labeling.", "evaluation": "f1"}
{"query": "Do they improve on state-of-the-art semantic role induction?", "gt": "Yes", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Yes", "evaluation": "f1"}
{"query": "how many tags do they look at?", "gt": "48,705", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "48,705", "evaluation": "f1"}
{"query": "which algorithm was the highest performer?", "gt": "A hybrid model consisting of best performing popularity-based approach with the best similarity-based approach", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 14 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. The hybrid approach that combines the best three popularity-based algorithms and the best similarity-based algorithm (INLINEFORM7) achieved the highest performance in terms of accuracy, semantic similarity, and recommendation diversity.", "evaluation": "f1"}
{"query": "how is diversity measured?", "gt": " the average dissimilarity of all pairs of tags in the list of recommended tags", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 14 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is the main problem addressed in the paper?\n2. What is the aim of the paper?\n3. What related work has been done in the field of tag recommender systems?\n4. What is the approach used in the paper to improve the performance of tag recommender systems for e-books?\n5. What is the dataset used in the paper?\n6. What are the three types of tag recommendation approaches used in the paper?\n7. What is the evaluation protocol used in the paper?\n8. What are the three evaluation metrics used in the paper?\n9. What are the results of the recommendation accuracy evaluation?\n10. What is the semantic similarity measure proposed in the paper?\n11. How is recommendation diversity measured in the paper?\n12. What is the conclusion of the paper?\n13. What is the future work planned in the paper?\n14. What is the aim of incorporating explanations for recommended tags?", "evaluation": "f1"}
{"query": "how large is the vocabulary?", "gt": "33,663", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "48,705", "evaluation": "f1"}
{"query": "what dataset was used?", "gt": " E-book annotation data: editor tags, Amazon search terms, and  Amazon review keywords.", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 12 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. The dataset used in this study is a combination of editor tags and Amazon search terms for e-books.", "evaluation": "f1"}
{"query": "what algorithms did they use?", "gt": "popularity-based, similarity-based, hybrid", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 3 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Popularity-based\n2. Similarity-based\n3. Hybrid approaches", "evaluation": "f1"}
{"query": "What baseline method is used?", "gt": "use the word2vec algorithm, create several unsupervised hand-crafted features, generate document vectors and feed them as input into the support vector machines (SVM) approach", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 24 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is sentiment analysis?\nSentiment analysis is the process of identifying and categorizing opinions expressed by reviewers on a topic or entity, typically used in marketing, social media analysis, and customer service.\n2. Why is sentiment analysis important?\nSentiment analysis is important because it can provide valuable insights into customer opinions and preferences, which can help businesses improve their products and services, as well as their marketing strategies.\n3. What is the difference between sentiment analysis and opinion mining?\nSentiment analysis is a specific type of opinion mining that focuses on identifying and categorizing opinions as positive, negative, or neutral. Opinion mining is a broader term that refers to the process of identifying and extracting subjective information from text.\n4. What are the main challenges in sentiment analysis?\nThe main challenges in sentiment analysis include dealing with ambiguous or conflicting opinions, handling out-of-vocabulary words, and dealing with cultural and linguistic differences.\n5. What are some common techniques used in sentiment analysis?\nSome common techniques used in sentiment analysis include rule-based approaches, machine learning-based approaches, and deep learning-based approaches.\n6. What is the difference between supervised and unsupervised sentiment analysis?\nSupervised sentiment analysis involves training a model on labeled data, while unsupervised sentiment analysis involves identifying sentiment without labeled data.\n7. What is the difference between lexicon-based and machine learning-based sentiment analysis?\nLexicon-based sentiment analysis uses a pre-built dictionary of words and their associated sentiment scores, while machine learning-based sentiment analysis uses a model trained on a large corpus of text to predict sentiment.\n8. What is the difference between sentiment analysis and emotion analysis?\nSentiment analysis focuses on identifying and categorizing opinions, while emotion analysis focuses on identifying and categorizing emotions expressed in text.\n9. What are some popular tools and libraries for sentiment analysis?\nSome popular tools and libraries for sentiment analysis include NLTK, TextBlob, VADER, and Stanford CoreNLP.\n10. What are some real-world applications of sentiment analysis?\nSome real-world applications of sentiment analysis include customer service, product reviews, social media monitoring, and political analysis.", "evaluation": "f1"}
{"query": "What details are given about the Twitter dataset?", "gt": "one of the Twitter datasets is about Turkish mobile network operators, there are positive, neutral and negative labels and provide the total amount plus the distribution of labels", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 28 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. The Twitter dataset is in Turkish and contains 1,716 tweets about Turkish mobile network operators.\n2. The tweets are manually annotated with positive or negative labels by two humans.\n3. The dataset has a Cohen's Kappa inter-annotator agreement score of 0.82.\n4. The dataset was used to evaluate the proposed sentiment analysis methods.", "evaluation": "f1"}
{"query": "What details are given about the movie domain dataset?", "gt": "there are 20,244 reviews divided into positive and negative with an average 39 words per review, each one having a star-rating score", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 22 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. The paper proposes a novel approach to sentiment analysis in Turkish, which combines unsupervised and supervised methods to generate word embeddings.\n2. The authors use official dictionaries for Turkish to capture sentimental information in the word embeddings.\n3. They also use word2vec embeddings trained on a large corpus and hand-crafted features on a review basis to create document vectors.\n4. The proposed approach outperforms previous studies on sentiment analysis in Turkish media and English corpora of different genres.\n5. The source code and datasets are publicly available.\n6. The paper is organized into five sections, including related work, methodology, experimental results, and conclusion.\n7. The methodology involves several approaches to generate word vectors, including a corpus-based approach, a dictionary-based approach, and a supervised contextual 4-scores approach.\n8. The authors also combine the embeddings generated by these approaches to create more accurate document vectors.\n9. The experimental results show that the proposed approach achieves state-of-the-art results on two Turkish datasets and two English corpora.\n10. The authors also perform qualitative assessments of the word representations by visualizing the most similar words to given query words.", "evaluation": "f1"}
{"query": "Which hand-crafted features are combined with word2vec?", "gt": "polarity scores, which are minimum, mean, and maximum polarity scores, from each review", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 13 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. The paper presents a study on sentiment analysis in Turkish and English.\n2. The authors propose several methods for generating word embeddings, including corpus-based and dictionary-based approaches.\n3. The word embeddings are combined with hand-crafted features to create document vectors and perform classification.\n4. The authors evaluate their methods on two datasets and achieve state-of-the-art results.\n5. The paper is organized into sections, including related work, methodology, and experimental results.\n6. The authors also compare their methods to previous studies conducted on sentiment analysis in Turkish and English.\n7. The source code and datasets are publicly available.\n8. The main contribution of the paper is the creation of effective word vectors that capture syntactic, semantic, and sentimental characteristics of words.\n9. The authors also utilize word2vec embeddings trained on a large corpus and generate hand-crafted features on a review basis.\n10. The paper concludes with a discussion of the implications of the study and future work.\n11. The authors suggest that their methods can be applied to other domains and languages with minor changes.\n12. The paper highlights the importance of considering sentimental information in generating word embeddings for improved performance in sentiment analysis tasks.\n13. The authors achieve high accuracy in predicting the labels of reviews using their proposed methods.", "evaluation": "f1"}
{"query": "What word-based and dictionary-based feature are used?", "gt": "generate word embeddings specific to a domain, TDK (T\u00fcrk Dil Kurumu - \u201cTurkish Language Institution\u201d) dictionary to obtain word polarities", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 20 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Word-based features: Contextual information, co-occurrence statistics, and supervised scores.\n2. Dictionary-based features: TDK dictionary, word polarities, and domain-specific polarity scores.", "evaluation": "f1"}
{"query": "How are the supervised scores of the words calculated?", "gt": "(+1 or -1), words of opposite polarities (e.g. \u201chappy\" and \u201cunhappy\") get far away from each other", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 17 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Sentiment analysis is a hot topic in NLP, especially for Turkish.\n2. Neural networks with word embedding vectors outperform traditional methods.\n3. Most studies use word2vec for word embeddings, but this discards sentimental information.\n4. The authors create effective word vectors that capture syntactic, semantic, and sentimental characteristics.\n5. They use official dictionaries and combine unsupervised and supervised scores to generate word embeddings.\n6. The authors outperform previous studies on sentiment analysis in Turkish media.\n7. They also perform well on English corpora of different genres.\n8. The source code and datasets are publicly available.\n9. The paper is organized into sections on related work, methodology, and experimental results.\n10. Previous studies have used LSA, LDA, and other methods for generating word embeddings.\n11. The authors use a corpus-based approach to capture contextual and semantic information.\n12. They also use a dictionary-based approach to incorporate dictionary definitions and polarity scores.\n13. The supervised contextual 4-scores method uses four supervised scores for each word in the corpus.\n14. The authors combine the three approaches and generate document vectors for reviews.\n15. The paper concludes that the proposed methods are effective for sentiment analysis in Turkish.\n16. The authors plan to extend their system to include neutral reviews and use Turkish WordNet for better generalizability.\n17. The study demonstrates the importance of incorporating sentimental information in word embeddings for improved performance in NLP tasks.", "evaluation": "f1"}
{"query": "What topic is covered in the Chinese Facebook data? ", "gt": "anti-nuclear-power", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "\u6838\u80fd", "evaluation": "f1"}
{"query": "How many layers does the UTCNN model have?", "gt": "eight layers", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 2 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "3 layers.", "evaluation": "f1"}
{"query": "What topics are included in the debate data?", "gt": "abortion, gay rights, Obama, marijuana", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 5 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Abortion\n2. Gay rights\n3. Obama\n4. Marijuana\n5. Nuclear power", "evaluation": "f1"}
{"query": "What topics are included in the debate data?", "gt": "abortion (ABO), gay rights (GAY), Obama (OBA), and marijuana (MAR)", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 10 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Abortion\n2. Gay rights\n3. Obama\n4. Marijuana\n5. Nuclear power\n6. Women's rights\n7. Medicine usage\n8. Crime\n9. Radiation\n10. Air pollution", "evaluation": "f1"}
{"query": "What is the size of the Chinese data?", "gt": "32,595", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "2,496", "evaluation": "f1"}
{"query": "Did they collected the two datasets?", "gt": "No", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Yes", "evaluation": "f1"}
{"query": "What are the baselines?", "gt": "SVM with unigram, bigram, trigram features, with average word embedding, with average transformed word embeddings, CNN and RCNN, SVM, CNN, RCNN with comment information", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 24 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. SVM with unigram, bigram, and trigram features\n2. SVM with average word embedding\n3. SVM with average transformed word embeddings\n4. CNN\n5. Recurrent Convolutional Neural Networks (RCNN)\n6. SVM and deep learning models with comment information\n7. UTCNN without user information\n8. UTCNN without the LDA model\n9. UTCNN without comments\n\nThese are the baselines used in the experiment to compare the performance of the proposed model (UTCNN) with other existing models.", "evaluation": "f1"}
{"query": "why are their techniques cheaper to implement?", "gt": "They use a slightly modified copy of the target to create the pseudo-text instead of full BT to make their technique cheaper", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 22 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is the main focus of the paper?\nThe paper focuses on understanding the strengths and weaknesses of using monolingual data in Neural Machine Translation (NMT) systems and designing more efficient techniques to improve their effects.\n2. What is the Europarl domain?\nThe Europarl domain refers to a specific set of data used in the study, which consists of the Europarl corpus in several languages, used as in-domain training data for two translation directions (English to German and English to French).\n3. What is the attentional encoder-decoder approach?\nThe attentional encoder-decoder approach is a type of NMT system that uses attention mechanisms to allow the model to focus on different parts of the input sequence during both encoding and decoding.\n4. What is the purpose of the experiments with synthetic parallel corpora?\nThe experiments with synthetic parallel corpora aim to explore various ways to use monolingual data in NMT and analyze the effectiveness of different techniques for generating artificial parallel data.\n5. What is the difference between back-translation and forward-translation?\nBack-translation involves translating a sentence from the target language back into the source language, while forward-translation involves translating a sentence from the source language to the target language.\n6. What is the purpose of the GAN setups?\nThe GAN setups aim to create more natural pseudo-sources by using Generative Adversarial Networks (GANs) to generate fake source sentences that look more like natural data.\n7. What is the deep-fusion technique?\nThe deep-fusion technique is a method for integrating a Language Model (LM) into NMT by adding the hidden state of an independently learned LM as an additional input to the softmax layer of the decoder.\n8. What is the main conclusion of the paper?\nThe main conclusion of the paper is that there are cheaper alternatives to using back-translation in NMT systems, such as using a copy of the target or generating artificial data with GANs, which can still improve translation quality and domain adaptation.", "evaluation": "f1"}
{"query": "what data simulation techniques were introduced?", "gt": "copy, copy-marked, copy-dummies", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 3 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Back-translation\n2. Forward-translation\n3. GANs", "evaluation": "f1"}
{"query": "what is their explanation for the effectiveness of back-translation?", "gt": "when using BT, cases where the source is shorter than the target are rarer; cases when they have the same length are more frequent, automatic word alignments between artificial sources tend to be more monotonic than when using natural sources", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 40 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Introduction: The paper aims to understand the strengths and weaknesses of back-translation (BT) in Neural Machine Translation (NMT) and design more efficient techniques to improve its effects.\n2. In-domain and out-of-domain data: The study focuses on the Europarl domain, using out-of-domain parallel data and limited in-domain monolingual data.\n3. NMT setups and performance: The baseline NMT system uses an attentional encoder-decoder approach, trained on 4 million out-of-domain parallel sentences.\n4. Using artificial parallel data in NMT: The paper explores various ways to use monolingual data in NMT, including back-translation of various qualities.\n5. Quality of Back-Translation: The quality of back-translation is tested using three MT systems of increasing quality: backtrans-bad, backtrans-good, and backtrans-nmt.\n6. Properties of back-translated data: The study analyzes the linguistic and distributional properties of artificial and natural sources of parallel data.\n7. Stupid Back-Translation: The paper examines the effect of using simpler data generation schemes, such as copying the target data or using marked target copies.\n8. Towards more natural pseudo-sources: The study uses Generative Adversarial Networks (GANs) to make pseudo-sources look more like natural data.\n9. Using Target Language Models: The paper compares the use of a target side Language Model (LM) with previous methods.\n10. Re-analyzing the effects of BT: The study analyzes the effect of back-translation on the internals of the network and its impact on the attention model.\n11. Related work: The paper is related to previous studies on using monolingual data in NMT, such as back-translation, forward-translation, and target language modeling.\n12. Conclusion: The paper concludes that back-translation is effective in NMT, but cheaper alternatives such as using copies of the target data with noise or GANs can also improve performance.", "evaluation": "f1"}
{"query": "what dataset is used?", "gt": "Europarl corpus , WMT newstest 2014, News-Commentary-11, Wikipedia from WMT 2014, Multi-UN, EU-Bookshop, Rapid, Common-Crawl (WMT 2017)", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 17 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. The Europarl domain is used with English to German and English to French translation directions.\n2. The dataset includes out-of-domain parallel corpora and limited in-domain monolingual data.\n3. The Europarl corpus is used for training and evaluation purposes.\n4. The WMT newstest 2014 is used for measuring out-of-domain performance.\n5. The NMT systems are trained on 4 million out-of-domain parallel sentences.\n6. Bilingual BPE units are learned with 50k merge operations.\n7. The NMT systems use 512-dimensional word embeddings and a single hidden layer with 1024 cells.\n8. The systems are optimized using Adam and early stopped based on validation performance.\n9. The NMT systems are trained for about three weeks on an Nvidia K80 GPU card.\n10. The baseline NMT system implements the attentional encoder-decoder approach.\n11. The NMT systems are trained using the same out-of-domain corpus with the source and target sides exchanged.\n12. The fine-tuning technique is used with in-domain natural data to improve translation performance.", "evaluation": "f1"}
{"query": "what language pairs are explored?", "gt": "English-German, English-French", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 2 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. English-German\n2. English-French", "evaluation": "f1"}
{"query": "what language is the data in?", "gt": "English , German, French", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 4 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is the main focus of the paper?\n2. What is the purpose of using monolingual data in NMT?\n3. What is the difference between back-translation and forward-translation?\n4. What are the results of the experiments conducted in the paper?\n5. What are the limitations of the paper?\n6. How does the use of monolingual data in NMT compare to using a target language model?\n7. What are some possible future directions for research in this area?", "evaluation": "f1"}
{"query": "What metrics are used for evaluation?", "gt": "BLEU-1/4 and ROUGE-L, likelihood of generated recipes using identical input specifications but conditioned on ten different user profiles, user matching accuracy (UMA), Mean Reciprocal Rank (MRR), neural scoring model from BIBREF33 to measure recipe-level coherence", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 35 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "\n\nThe document describes a new approach to generating personalized recipes using natural language processing and recommender systems. The approach combines data-to-text generation and personalized recommendation to generate recipes that are tailored to the user's tastes and preferences. The model takes as input the name of a specific dish, a few key ingredients, and a calorie level, and uses this information to generate a personalized recipe. The model is trained on a large dataset of recipes and user reviews, and is evaluated using quantitative measures of coherence and qualitative analysis. The approach is related to previous work in data-to-text generation and user- and item-dependent aspect-aware review generation. The document concludes by summarizing the main contributions of the approach and outlining future work.", "evaluation": "f1"}
{"query": "What natural language(s) are the recipes written in?", "gt": "English", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. English", "evaluation": "f1"}
{"query": "What were their results on the new dataset?", "gt": "average recipe-level coherence scores of 1.78-1.82, human evaluators preferred personalized model outputs to baseline 63% of the time", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 18 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. The authors propose a new task of generating personalized recipes from incomplete input specifications by leveraging historical user preferences.\n2. They release a new dataset of 180K+ recipes and 700K+ user reviews for this task.\n3. The authors introduce new evaluation strategies for generation quality in instructional texts, centering on quantitative measures of coherence.\n4. They show qualitatively and quantitatively that personalized models generate high-quality and specific recipes that align with historical user preferences.\n5. The authors' work combines data-to-text generation and personalized recommendation from natural language processing and recommender systems.\n6. The input specification consists of the recipe name, a partial list of ingredients, and a caloric level.\n7. The model uses historical recipe interactions of a user to personalize output.\n8. The encoder has three embedding layers: vocabulary, ingredient, and caloric-level embedding.\n9. The decoder is a two-layer GRU with hidden state conditioned on previous hidden state and input token from the original recipe text.\n10. The authors use attention over encoded ingredients and prior recipes to bias generation toward user preferences.\n11. The model fuses all contexts calculated at each decoding time step to determine the final output token distribution.\n12. The authors evaluate their model on a novel dataset of 180K+ recipes and 700K+ user reviews.\n13. The model outperforms baselines in BPE perplexity, generating more diverse and acceptable recipes.\n14. The authors provide qualitative analysis of sample outputs for a cocktail recipe and additional recipes.\n15. They measure personalization by evaluating how closely the generated text corresponds to a particular user profile.\n16. The authors introduce new evaluation strategies for generation quality in instructional texts, centering on quantitative measures of coherence.\n17. The model generates plausible, personalized, and coherent recipes preferred by human evaluators for consumption.\n18. The authors plan to extend their work to generate structured representations of recipes to handle ingredient properties and account for references to collections of ingredients.", "evaluation": "f1"}
{"query": "What are the baseline models?", "gt": "name-based Nearest-Neighbor model (NN), Encoder-Decoder baseline with ingredient attention (Enc-Dec)", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 10 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. The paper proposes a new task of generating personalized recipes from incomplete input specifications by leveraging historical user preferences.\n2. The authors release a new dataset of 180K+ recipes and 700K+ user reviews for this task.\n3. The paper introduces new evaluation strategies for generation quality in instructional texts, centering on quantitative measures of coherence.\n4. The authors show qualitatively and quantitatively that personalized models generate high-quality and specific recipes that align with historical user preferences.\n5. The paper combines two important tasks from natural language processing and recommender systems: data-to-text generation and personalized recommendation.\n6. The model takes as user input the name of a specific dish, a few key ingredients, and a calorie level.\n7. The model uses historical recipe interactions of a user to generate recipe content, rather than writing styles.\n8. The authors attend over historical user preferences from previously consumed recipes to generate recipe content.\n9. The paper is the first to consider personalized text generation for the problem of recipe generation.\n10. The paper explores a new task of generating plausible and personalized recipes from incomplete input specifications by leveraging historical user preferences.", "evaluation": "f1"}
{"query": "How did they obtain the interactions?", "gt": "from Food.com", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 2 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is the main contribution of the paper?\nThe main contribution of the paper is to propose a new task of generating plausible and personalized recipes from incomplete input specifications by leveraging historical user preferences.\n2. What is the input specification of the model?\nThe input specification of the model consists of the recipe name as a sequence of tokens, a partial list of ingredients, and a caloric level (high, medium, low).\n3. How does the model personalize the output?\nThe model personalizes the output by using historical recipe interactions of a user from the dataset.\n4. What is the encoder of the model?\nThe encoder of the model has three embedding layers: vocabulary embedding, ingredient embedding, and caloric-level embedding. Each token in the recipe name is embedded via the vocabulary embedding, and the embedded token sequence is passed to a bidirectional GRU. Each of the input ingredients is embedded via the ingredient embedding, and the embedded ingredient sequence is passed to another bidirectional GRU to output ingredient hidden states. The caloric level is embedded via the caloric-level embedding and passed through a projection layer to generate a calorie hidden representation.\n5. What is the decoder of the model?\nThe decoder is a two-layer GRU with a hidden state conditioned on the previous hidden state and the input token from the original recipe text. The decoder is also conditioned on the outputs of the encoder, which are obtained by attending over the encoded ingredients.\n6. What are the two different schemes used to model preferences from user histories?\nThe two different schemes used to model preferences from user histories are recipe interactions and techniques seen in the recipes.\n7. What is the recipe dataset used in the paper?\nThe recipe dataset used in the paper is collected from Food.com, which consists of 230K+ recipe texts and 1M+ user interactions (reviews) over 18 years (2000-2018).\n8. What is the evaluation metric used in the paper?\nThe evaluation metric used in the paper is based on quantitative measures of coherence, such as perplexity and user-ranking, as well as qualitative analysis of", "evaluation": "f1"}
{"query": "Where do they get the recipes from?", "gt": "from Food.com", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 2 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is the main contribution of the paper?\nThe main contribution of the paper is to propose a new task of generating plausible and personalized recipes from incomplete input specifications by leveraging historical user preferences.\n2. What is the new dataset released in the paper?\nThe paper releases a new dataset of 180K+ recipes and 700K+ user reviews for the task of generating plausible and personalized recipes from incomplete input specifications.\n3. What are the two important tasks from natural language processing and recommender systems that the paper combines?\nThe paper combines two important tasks from natural language processing and recommender systems: data-to-text generation and personalized recommendation.\n4. How does the model take user input?\nThe model takes user input in the form of the name of a specific dish, a few key ingredients, and a calorie level.\n5. What is the attention fusion layer in the model?\nThe attention fusion layer in the model fuses `user-aware' representations with decoder output to jointly determine text generation.\n6. What is the problem of recipe generation that the paper addresses?\nThe paper addresses the problem of recipe generation, where output quality is heavily dependent on the content of the instructions, such as ingredients and cooking techniques.\n7. What is the new metric for recipe-level coherence introduced in the paper?\nThe paper introduces a new metric for recipe-level coherence that measures the overall recipe step ordering structure by minimizing the cosine similarity of recipe step hidden representations presented in the correct and reverse orders.", "evaluation": "f1"}
{"query": "what are their results on the constructed dataset?", "gt": "AutoJudge consistently and significantly outperforms all the baselines, RC models achieve better performance than most text classification models (excluding GRU+Attention), Comparing with conventional RC models, AutoJudge achieves significant improvement", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 29 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is the main idea of the paper?\nThe paper proposes a new framework called Legal Reading Comprehension (LRC) for automatic judgment prediction in civil law systems. LRC incorporates a reading mechanism to better model the complementary inputs of fact description, pleas, and law articles. The authors also introduce a new model called AutoJudge to instantiate the LRC framework.\n2. What are the challenges faced by previous works in automatic judgment prediction?\nPrevious works in automatic judgment prediction face two main challenges: the one-to-many relation between a case and its pleas, and the heterogeneity of input triple. These challenges arise from the complexity of civil law systems and the need to integrate multiple and heterogeneous inputs.\n3. How does the LRC framework address the challenges faced by previous works?\nThe LRC framework addresses the challenges faced by previous works by incorporating a reading mechanism that simulates how humans connect and integrate multiple texts. This mechanism allows for a better modeling of the complementary inputs of fact description, pleas, and law articles. The LRC framework also allows for the integration of law articles into the judgment prediction process, which is critical in civil law systems.\n4. What is the AutoJudge model and how does it work?\nAutoJudge is a neural network model that instantiates the LRC framework. It consists of three modules: a text encoder, a pair-wise attentive reader, and an output module. The text encoder encodes the word sequences of inputs into continuous representation sequences. The pair-wise attentive reader uses a pair-wise mutual attention mechanism to capture the complex semantic interaction between text pairs. The output module uses a 1-layer CNN to generate the representation vector for the final prediction.\n5. How is the performance of AutoJudge compared to previous works?\nThe performance of AutoJudge is compared to previous works on a constructed dataset. The results show that AutoJudge outperforms all baselines, including RC models and other neural text classification models, which demonstrates the effectiveness and robustness of the model.", "evaluation": "f1"}
{"query": "what evaluation metrics are reported?", "gt": "precision, recall, F1 and accuracy", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 5 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Precision, recall, F1, and accuracy.", "evaluation": "f1"}
{"query": "what civil field is the dataset about?", "gt": "divorce", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What are the two main challenges faced by previous works in automatic judgment prediction in civil law system?\n2. What is the Legal Reading Comprehension (LRC) framework proposed in the paper?\n3. What is the AutoJudge model proposed in the paper?\n4. What are the three flexible modules in the AutoJudge model?\n5. How does the pair-wise attentive reader module work in the AutoJudge model?\n6. What is the output layer in the AutoJudge model?\n7. How is the final predicted probability calculated in the AutoJudge model?\n8. What is the main contribution of the paper?\n9. What is the name of the dataset used in the experiments?\n10. How is the dataset constructed for evaluation?\n11. What is the name replacement process used in the dataset construction?\n12. What is the law article filtration process used in the dataset construction?\n13. What is the name of the unsupervised method used for law article filtration?\n14. What is the main marriage law articles and their interpretations used in the law article filtration process?\n15. What is the name of the unsupervised method used for extracting relevant law articles?\n16. What is the name of the method used for calculating the relevance between the fact description and the plea in the pair-wise attentive reader module?\n17. What is the name of the method used for selecting relevant fact information in the pair-wise attentive reader module?\n18. What is the name of the method used for generating the representation vector for the final prediction in the output layer?\n19. What is the name of the optimization algorithm used in the AutoJudge model?\n20. What is the name of the regular expression used for extracting labeled data from the case documents?", "evaluation": "f1"}
{"query": "what are the state-of-the-art models?", "gt": "SVM , CNN , GRU , CNN/GRU+law, r-net , AoA ", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 10 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is the main idea of the paper?\n2. What are the two main challenges in automatic judgment prediction in civil law systems?\n3. What is the Legal Reading Comprehension (LRC) framework proposed in the paper?\n4. What is the AutoJudge model in the paper?\n5. What are the three modules in the AutoJudge model?\n6. How does the text encoder module in the AutoJudge model work?\n7. How does the pair-wise attentive reader module in the AutoJudge model work?\n8. How does the output layer module in the AutoJudge model work?\n9. What is the cross-entropy loss function used in the AutoJudge model?\n10. What is the role of the law articles in the AutoJudge model?", "evaluation": "f1"}
{"query": "what is the size of the real-world civil case dataset?", "gt": "100 000 documents", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 3 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "7 times increase.", "evaluation": "f1"}
{"query": "what datasets are used in the experiment?", "gt": "build a new one, collect INLINEFORM0 cases from China Judgments Online", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 11 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is the main idea of the paper?\n2. What are the two main challenges faced by previous works in automatic judgment prediction in civil law systems?\n3. What is the Legal Reading Comprehension (LRC) framework proposed in the paper?\n4. What is the AutoJudge model proposed in the paper?\n5. What are the three modules of the AutoJudge model?\n6. How does the AutoJudge model incorporate law articles into the judgment prediction process?\n7. What is the pair-wise mutual attention mechanism used in the AutoJudge model?\n8. How does the output layer of the AutoJudge model work?\n9. What is the cross-entropy loss function used in the AutoJudge model?\n10. What is the role of the Adam optimizer in the AutoJudge model?\n11. What is the name of the dataset used in the experiments?\n12. How is the dataset constructed for evaluation?\n13. What is the name replacement process used in the dataset construction?\n14. What is the law article filtration process used in the dataset construction?\n15. What are the evaluation metrics used in the experiments?\n16. What are the baselines used in the experiments?\n17. What are the results and analysis of the experiments?\n18. What is the ablation test conducted in the paper?\n19. What is the significance of the attention mechanism in the AutoJudge model?\n20. What is the significance of the law articles in the AutoJudge model?", "evaluation": "f1"}
{"query": "What domains are covered in the corpus?", "gt": "No specific domain is covered in the corpus.", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 8 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "11 languages into English, diversified with 11,000+ speakers and 60+ accents.", "evaluation": "f1"}
{"query": "What is the architecture of their model?", "gt": "follow the architecture in berard2018end, but have 3 decoder layers like that in pino2019harnessing", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 14 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is the main focus of the paper?\n2. What is the name of the multilingual ST corpus introduced in the paper?\n3. How many languages are included in the CoVoST corpus?\n4. What is the purpose of the Tatoeba evaluation set?\n5. What is the name of the speech recognition corpus used to create CoVoST?\n6. How many hours of speech are included in the CoVoST corpus for each language?\n7. What is the purpose of the quality checks applied to the translations in CoVoST?\n8. What is the name of the language learning corpus used to create the Tatoeba evaluation set?\n9. What is the purpose of the basic statistics presented in Table TABREF2?\n10. How does the amount of speech included in CoVoST compare to existing public ST corpora?\n11. What is the purpose of the speaker diversity analysis presented in the paper?\n12. What is the name of the architecture used for ASR and ST models in the paper?\n13. What is the purpose of the data preprocessing steps described in the paper?\n14. What is the purpose of the model training described in the paper?\n15. What is the purpose of the inference and evaluation described in the paper?\n16. What is the name of the evaluation metric used for MT and ST models in the paper?\n17. What is the purpose of the baseline results presented in the paper?\n18. What is the name of the many-to-one multilingual ST corpus introduced in the paper?\n19. What is the purpose of the speaker demographics analysis presented in the paper?\n20. What is the name of the evaluation set created from Tatoeba?\n21. What is the purpose of the quality checks applied to the TT samples?\n22. What is the name of the evaluation metric used for ASR models in the paper?\n23. What is the purpose of the basic statistics presented in Table TABREF2?\n24. What is the name of the evaluation set created from Tatoeba for French, German, Dutch, Russian and Spanish?", "evaluation": "f1"}
{"query": "How was the dataset collected?", "gt": "Contributors record voice clips by reading from a bank of donated sentences.", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 12 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. The dataset was collected from Common Voice, a crowdsourcing speech recognition corpus.\n2. The dataset includes speeches from 11 languages into English, with over 11,000 speakers and 60 accents.\n3. The dataset was created at the sentence level and does not require additional alignments or segmentation.\n4. The dataset was validated by professional translators and includes translations for train, development, and test splits.\n5. The dataset was created to bridge the performance gaps between end-to-end models and cascading systems.\n6. The dataset was created to provide a larger and more diverse dataset for speech-to-text translation research.\n7. The dataset was created to provide a dataset for studying end-to-end multilingual speech-to-text translation.\n8. The dataset was created to provide a dataset for studying speech-to-text translation in low-resource languages.\n9. The dataset was created to provide a dataset for studying speech-to-text translation in rare languages.\n10. The dataset was created to provide a dataset for studying speech-to-text translation in specific domains.\n11. The dataset was created to provide a dataset for studying speech-to-text translation in multiple languages.\n12. The dataset was created to provide a dataset for studying speech-to-text translation in multiple languages and domains.", "evaluation": "f1"}
{"query": "Which languages are part of the corpus?", "gt": "French, German, Dutch, Russian, Spanish, Italian, Turkish, Persian, Swedish, Mongolian and Chinese", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 12 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "11 languages are part of the CoVoST corpus: French, German, Dutch, Russian, Spanish, Italian, Turkish, Persian, Swedish, Mongolian, and Chinese.", "evaluation": "f1"}
{"query": "How is the quality of the data empirically evaluated? ", "gt": "Validated transcripts were sent to professional translators., various sanity checks to the translations,  sanity check the overlaps of train, development and test sets", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 23 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. The quality of the data is empirically evaluated using various sanity checks such as BLEU, perplexity, and human inspection.\n2. The translations are checked against human translations produced by a state-of-the-art system, and those with low scores are sent back to the translators.\n3. The similarity scores between transcripts and translations are calculated using LASER cross-lingual sentence embeddings, and samples with low scores are manually inspected and sent back for translation.\n4. The overlaps of train, development, and test sets in terms of transcripts and voice clips are checked, and they are found to be totally disjoint.\n5. The quality of the translations in Tatoeba is not explicitly mentioned, but it is noted that the translations had access to the transcripts but not the corresponding voice clips.", "evaluation": "f1"}
{"query": "Is the data in CoVoST annotated for dialect?", "gt": "No", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. No", "evaluation": "f1"}
{"query": "Is Arabic one of the 11 languages in CoVost?", "gt": "No", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "\u0646\u0639\u0645 (Yes)", "evaluation": "f1"}
{"query": "What kind of model do they use?", "gt": "visual model is based on fine-tuning an Inception V3 model BIBREF1 over visual renderings of documents, while our textual model is based on a hierarchical biLSTM. We further combine the two into a joint model. , neural network models", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 39 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is the task of document quality assessment?\n2. What are some examples of document quality assessment in different contexts?\n3. What is the main contribution of this paper?\n4. What is the visual model proposed in this paper?\n5. What is the textual model proposed in this paper?\n6. How are the visual and textual models combined in the joint model?\n7. What are the two datasets used in the experiments?\n8. How are the Wikipedia dataset and arXiv dataset different?\n9. What is the Inception V3 model used for in this paper?\n10. What is the biLSTM model used for in this paper?\n11. How is the joint model trained and optimized?\n12. What are the baseline approaches used in the comparison?\n13. What are the experimental results on the Wikipedia dataset?\n14. What are the experimental results on the arXiv dataset?\n15. How do the visual and textual features complement each other in the joint model?\n16. What is the significance of the Joint model's performance on the Wikipedia dataset?\n17. What is the significance of the Joint model's performance on the arXiv dataset?\n18. How do the visual and textual features complement each other in the joint model?\n19. What is the main conclusion of the paper?", "evaluation": "f1"}
{"query": "What kind of model do they use?", "gt": "Inception V3, biLSTM", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 3 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Joint model\n2. Visual and textual\n3. Inception V3", "evaluation": "f1"}
{"query": "Did they release their data set of academic papers?", "gt": "No", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Yes", "evaluation": "f1"}
{"query": "Do the methods that work best on academic papers also work best on Wikipedia?", "gt": "No", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "\n\nThe proposed joint model combines visual and textual features to predict document quality. The visual features are obtained by fine-tuning an Inception V3 model over visual renderings of documents, while the textual features are obtained from a hierarchical biLSTM. The two models are combined into a joint model, which is trained on two datasets: a Wikipedia dataset and an arXiv dataset split into three sub-parts based on subject category. The experimental results show that the joint model achieves state-of-the-art results over 3/4 of the datasets, indicating that the combination of visual and textual features can improve the accuracy of document quality assessment.", "evaluation": "f1"}
{"query": "What is their system's absolute accuracy?", "gt": "59.4% on wikipedia dataset, 93.4% on peer-reviewed archive AI papers, 77.1%  on peer-reviewed archive Computation and Language papers, and 79.9% on peer-reviewed archive Machine Learning papers", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 26 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "87.6% accuracy achieved on Wikipedia dataset, and state-of-the-art results on 3/4 of the datasets.", "evaluation": "f1"}
{"query": "Which is more useful, visual or textual features?", "gt": "It depends on the dataset. Experimental results over two datasets reveal that textual and visual features are complementary. ", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 18 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "\n\nThe paper proposes a joint model that integrates visual and textual features to assess document quality. The visual model is based on fine-tuning an Inception V3 model over visual renderings of documents, while the textual model is based on a hierarchical biLSTM. The proposed model achieves state-of-the-art results over three-fourths of the datasets. The paper contributes to the field of document quality assessment by demonstrating the effectiveness of combining visual and textual features.", "evaluation": "f1"}
{"query": "Which languages do they use?", "gt": "English", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. English", "evaluation": "f1"}
{"query": "How large is their data set?", "gt": "a sample of  29,794 wikipedia articles and 2,794 arXiv papers ", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 10 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "29,794 articles in the Wikipedia dataset, with varying numbers of articles in the arXiv dataset.", "evaluation": "f1"}
{"query": "Where do they get their ground truth quality judgments?", "gt": "quality class labels assigned by the Wikipedia community, a paper is considered to have been accepted (i.e. is positively labeled) if it matches a paper in the DBLP database or is otherwise accepted by any of the following conferences: ACL, EMNLP, NAACL, EACL, TACL, NIPS, ICML, ICLR, or AAAI", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 49 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. The authors use the quality class labels assigned by the Wikipedia community for the Wikipedia dataset.\n2. For the arXiv dataset, a paper is considered accepted if it matches a paper in the DBLP database or is otherwise accepted by certain conferences.", "evaluation": "f1"}
{"query": "What is the seed lexicon?", "gt": "a vocabulary of positive and negative predicates that helps determine the polarity score of an event", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 16 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. The seed lexicon is a small set of positive and negative words used to initialize the learning of affective events.\n2. It includes words like \"rejoice,\" \"sad,\" \"excited,\" \"scared,\" \"happy,\" \"lonely,\" \"worried,\" and \"sorry.\"\n3. The lexicon is used to assign polarity scores to events based on the presence of these words.\n4. The model is then trained to learn more complex relationships between events using discourse relations.\n5. The seed lexicon is a crucial component of the proposed method for learning affective events.", "evaluation": "f1"}
{"query": "What is the seed lexicon?", "gt": "seed lexicon consists of positive and negative predicates", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 8 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "15 positive and 15 negative words.", "evaluation": "f1"}
{"query": "What are the results?", "gt": "Using all data to train: AL -- BiGRU achieved 0.843 accuracy, AL -- BERT achieved 0.863 accuracy, AL+CA+CO -- BiGRU achieved 0.866 accuracy, AL+CA+CO -- BERT achieved 0.835, accuracy, ACP -- BiGRU achieved 0.919 accuracy, ACP -- BERT achived 0.933, accuracy, ACP+AL+CA+CO -- BiGRU achieved 0.917 accuracy, ACP+AL+CA+CO -- BERT achieved 0.913 accuracy. \nUsing a subset to train: BERT achieved 0.876 accuracy using ACP (6K), BERT achieved 0.886 accuracy using ACP (6K) + AL, BiGRU achieved 0.830 accuracy using ACP (6K), BiGRU achieved 0.879 accuracy using ACP (6K) + AL + CA + CO.", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 94 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "\u5728\u672c\u6587\u4e2d\uff0c\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u63a8\u7406\u5173\u7cfb\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u4ece\u5c0f\u7684\u6ce8\u91ca\u6570\u636e\u96c6\u4e2d\u5b66\u4e60\u60c5\u611f\u4e8b\u4ef6\u7684\u6027\u8d28\u3002\u8be5\u65b9\u6cd5\u5229\u7528\u4e00\u4e2a\u5c0f\u7684\u79cd\u5b50\u8bcd\u5178\u548c\u5927\u578b\u539f\u59cb\u8bed\u6599\u5e93\uff0c\u901a\u8fc7\u63a8\u7406\u5173\u7cfb\u6765\u6709\u6548\u5730\u4f20\u64ad\u79cd\u5b50\u4e8b\u4ef6\u7684\u60c5\u611f\u6027\u8d28\u3002\u4f5c\u8005\u4f7f\u7528\u4e86\u4e24\u79cd\u4e0d\u540c\u7684\u7f16\u7801\u5668\uff08BiGRU\u548cBERT\uff09\uff0c\u5e76\u5728\u4e0d\u540c\u7684\u6570\u636e\u96c6\u7ec4\u5408\u4e0b\u8fdb\u884c\u4e86\u8bad\u7ec3\u3002\u7ed3\u679c\u8868\u660e\uff0c\u4f7f\u7528BiGRU\u7f16\u7801\u5668\u548cAL+CA+CO\u6570\u636e\u96c6\u7ec4\u5408\u7684\u6a21\u578b\u5728\u51c6\u786e\u6027\u65b9\u9762\u8868\u73b0\u6700\u597d\u3002\u5c3d\u7ba1CA\u548cCO\u6570\u636e\u96c6\u4e2d\u5b58\u5728\u566a\u58f0\uff0c\u4f46\u8be5\u65b9\u6cd5\u4ecd\u7136\u53ef\u4ee5\u6709\u6548\u5730\u5b66\u4e60\u60c5\u611f\u4e8b\u4ef6\u7684\u6027\u8d28\u3002", "evaluation": "f1"}
{"query": "How are relations used to propagate polarity?", "gt": "cause relation: both events in the relation should have the same polarity; concession relation: events should have opposite polarity", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 19 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "\n\nIn the proposed method, discourse relations are used to propagate polarity of affective events. If a seed predicate directly reports one's emotions, and events $x\\_1$ and $x\\_2$ are in the discourse relation of Cause (i.e., $x\\_1$ causes $x\\_2$), the polarity of $x\\_2$ is used to predict the polarity of $x\\_1$. Similarly, if $x\\_1$ and $x\\_2$ are in the discourse relation of Concession (i.e., $x\\_2$ in spite of $x\\_1$), the reverse of $x\\_2$'s polarity can be propagated to $x\\_1$.", "evaluation": "f1"}
{"query": "How big is the Japanese data?", "gt": "The ACP corpus has around 700k events split into positive and negative polarity ", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 13 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "100 million sentences.", "evaluation": "f1"}
{"query": "What are labels available in dataset for supervision?", "gt": "negative, positive", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 2 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "\n\nThe dataset used for supervision in this paper is the ACP Corpus.", "evaluation": "f1"}
{"query": "How big are improvements of supervszed learning results trained on smalled labeled data enhanced with proposed approach copared to basic approach?", "gt": "3%", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "5%", "evaluation": "f1"}
{"query": "How does their model learn using mostly raw data?", "gt": "by exploiting discourse relations to propagate polarity from seed predicates to final sentiment polarity", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 14 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "\n\nThe model learns using mostly raw data by leveraging discourse relations to efficiently propagate polarity from seed predicates, which are automatically extracted from the raw corpus.", "evaluation": "f1"}
{"query": "How big is seed lexicon used for training?", "gt": "30 words", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 2 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "15 positive, 15 negative.", "evaluation": "f1"}
{"query": "How large is raw corpus used for training?", "gt": "100 million sentences", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 3 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "100 million sentences.", "evaluation": "f1"}
{"query": "What data is the language model pretrained on?", "gt": "Chinese general corpus", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 3 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is Clinical Text Structuring (CTS)?\n2. What are the challenges of end-to-end CTS?\n3. What are the traditional methods for addressing CTS tasks?\n4. What is the Question Answering based Clinical Text Structuring (QA-CTS) task?\n5. How does the QA-CTS task unify different specific tasks?\n6. What is the main contribution of this work?\n7. What is the proposed model for the QA-CTS task?\n8. How does the proposed model integrate clinical named entity information into a pre-trained language model?\n9. What is the two-stage training mechanism used in the proposed model?\n10. How does the proposed model perform compared to strong baseline methods?", "evaluation": "f1"}
{"query": "What baselines is the proposed model compared against?", "gt": "BERT-Base, QANet", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 2 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "\n\nThe proposed model is compared against strong baseline methods.", "evaluation": "f1"}
{"query": "How is the clinical text structuring task defined?", "gt": "Clinical text structuring (CTS) is a critical task for fetching medical research data from electronic health records (EHRs), where structural patient medical data, such as whether the patient has specific symptoms, diseases, or what the tumor size is, how far from the tumor is cut at during the surgery, or what the specific laboratory test result is, are obtained., Unlike the traditional CTS task, our QA-CTS task aims to discover the most related text from original paragraph text. ", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 78 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "\u5728\u8fd9\u7bc7\u6587\u7ae0\u4e2d\uff0c\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u95ee\u9898\u56de\u7b54\u57fa\u4e8e\u4e34\u5e8a\u6587\u672c\u7ed3\u6784\u5316\uff08QA-CTS\uff09\u4efb\u52a1\uff0c\u8be5\u4efb\u52a1\u65e8\u5728\u627e\u51fa\u4e0e\u539f\u59cb\u6bb5\u843d\u6587\u672c\u6700\u76f8\u5173\u7684\u6587\u672c\u3002\u8fd9\u79cd\u65b9\u6cd5\u53ef\u4ee5\u4f5c\u4e3a\u4e00\u79cd\u7ed3\u6784\u5316\u4e34\u5e8a\u6587\u672c\u7684\u65b0\u65b9\u6cd5\uff0c\u5b83\u53ef\u4ee5\u5c06\u4e0d\u540c\u7684\u4e34\u5e8a\u6587\u672c\u7ed3\u6784\u4efb\u52a1\u7684\u6570\u636e\u5171\u4eab\uff0c\u4ece\u800c\u589e\u52a0\u8bad\u7ec3\u6570\u636e\u7684\u6570\u91cf\u3002\u4f5c\u8005\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u5c06\u4e34\u5e8a\u547d\u540d\u5b9e\u4f53\u4fe1\u606f\u4e0e\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u96c6\u6210\u5728\u4e00\u8d77\uff0c\u5e76\u9002\u7528\u4e8eQA-CTS\u4efb\u52a1\u3002\u8be5\u6a21\u578b\u901a\u8fc7\u5c06\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\u7ed3\u679c\u4e0e\u4e0a\u4e0b\u6587\u8868\u793a\u6a21\u578b\u7ed3\u5408\u8d77\u6765\uff0c\u5e76\u5c06\u5176\u4f20\u9012\u5230\u4e00\u4e2a\u5d4c\u5165\u5668\u4e2d\uff0c\u4ece\u800c\u751f\u6210\u6700\u7ec8\u7684\u7b54\u6848\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8be5\u6a21\u578b\u5728\u4e09\u4e2a\u7279\u5b9a\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u4f18\u4e8e\u5f3a\u57fa\u7ebf\u65b9\u6cd5\u3002", "evaluation": "f1"}
{"query": "How is the clinical text structuring task defined?", "gt": "CTS is extracting structural data from medical research data (unstructured). Authors define QA-CTS task that aims to discover most related text from original text.", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 24 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. The clinical text structuring task is defined as a question answering based task that aims to discover the most related text from the original paragraph text.\n2. It unifies different specific tasks and makes the dataset shareable, improving performance and generality.\n3. The task involves integrating clinical named entity information into a pre-trained language model and using a two-stage training mechanism.\n4. The goal is to improve the accuracy of structured data extraction from clinical text for downstream clinical research.\n5. Traditional methods use rule and dictionary based methods, task-specific end-to-end methods, and pipeline methods, but they have limitations.\n6. The proposed model uses a contextualized representation of sentence text and query text, clinical named entity information, and a feed-forward network for final prediction.\n7. The model is trained on a large-scale dataset and evaluated using metrics such as EM-score and F1-score.\n8. The model outperforms state-of-the-art methods in terms of accuracy and provides a new approach to clinical text structuring.", "evaluation": "f1"}
{"query": "What are the specific tasks being unified?", "gt": " three types of questions, namely tumor size, proximal resection margin and distal resection margin", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 14 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is the main contribution of this work?\nThe main contribution of this work is the introduction of a question answering based clinical text structuring (QA-CTS) task, which unifies different specific tasks and makes the dataset shareable. The authors also propose an effective model to integrate clinical named entity information into a pre-trained language model.", "evaluation": "f1"}
{"query": "Is all text in this dataset a question, or are there unrelated sentences in between questions?", "gt": "the dataset consists of pathology reports including sentences and questions and answers about tumor size and resection margins so it does include additional sentences ", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 24 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "\u5728\u8fd9\u7bc7\u6587\u7ae0\u4e2d\uff0c\u6240\u6709\u7684\u6587\u672c\u90fd\u662f\u95ee\u9898\uff0c\u6ca1\u6709\u4e0d\u76f8\u5173\u7684\u53e5\u5b50\u3002", "evaluation": "f1"}
{"query": "How many questions are in the dataset?", "gt": "2,714 ", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1714", "evaluation": "f1"}
{"query": "How they introduce domain-specific features into pre-trained language model?", "gt": "integrate clinical named entity information into pre-trained language model", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 9 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. The document discusses the importance of clinical text structuring (CTS) in extracting structured data from electronic health records (EHRs) for bio-medical research.\n2. CTS is a challenging task due to non-uniform output formats and the need for large amounts of labeled data for training neural networks.\n3. The authors propose a question answering based clinical text structuring (QA-CTS) task to unify different specific tasks and make dataset shareable.\n4. The QA-CTS task aims to discover the most related text from original paragraph text and can be used for tasks such as entity names conversion and negative words recognition.\n5. The authors present an effective model for the QA-CTS task that integrates clinical named entity information into a pre-trained language model.\n6. The proposed model uses a two-stage training mechanism and contextualized representation of sentence text and query text.\n7. The authors also use a clinical named entity recognition (CNER) model to capture named entity information and obtain one-hot CNER output tagging sequence for query text and paragraph text.\n8. The authors evaluate the proposed model on a dataset of Chinese pathology reports and achieve significant improvement in performance compared to traditional CTS methods.\n9. The authors conclude that the QA-CTS task and proposed model can help overcome the challenges of CTS and provide large-scale extracted structured data for downstream clinical research.", "evaluation": "f1"}
{"query": "How big is QA-CTS task dataset?", "gt": "17,833 sentences, 826,987 characters and 2,714 question-answer pairs", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 8 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "17,833 sentences, 826,987 characters, 2,714 question-answer pairs.", "evaluation": "f1"}
{"query": "What are strong baseline models in specific tasks?", "gt": "state-of-the-art question answering models (i.e. QANet BIBREF39) and BERT-Base BIBREF26", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 10 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "\u5c0d\u65bc\u4e26\u5217\u8655\u7406\u7684\u554f\u984c\uff0c\u53ef\u4ee5\u4f7f\u7528\u4ee5\u4e0b\u5169\u7a2e\u65b9\u6cd5\uff1a\n\n1. \u5206\u5c64\u8655\u7406\uff1a\u5c07\u554f\u984c\u5206\u6210\u82e5\u5e72\u5c64\u6b21\uff0c\u6bcf\u500b\u5c64\u6b21\u8655\u7406\u4e00\u90e8\u5206\u554f\u984c\uff0c\u6700\u5f8c\u5c07\u6240\u6709\u5c64\u6b21\u7684\u7d50\u679c\u7d50\u5408\u8d77\u4f86\u3002\n2. \u5206\u6563\u5f0f\u8655\u7406\uff1a\u5c07\u554f\u984c\u5206\u6210\u82e5\u5e72\u90e8\u5206\uff0c\u6bcf\u500b\u90e8\u5206\u5728\u4e0d\u540c\u7684\u8655\u7406\u5668\u4e0a\u9032\u884c\u8655\u7406\uff0c\u6700\u5f8c\u5c07\u6240\u6709\u90e8\u5206\u7684\u7d50\u679c\u7d50\u5408\u8d77\u4f86\u3002\n\n\u5728\u5206\u5c64\u8655\u7406\u4e2d\uff0c\u53ef\u4ee5\u4f7f\u7528\u591a\u5c64\u795e\u7d93\u7db2\u7d61\u4f86\u8655\u7406\u4e0d\u540c\u5c64\u6b21\u7684\u554f\u984c\u3002\u5728\u5206\u6563\u5f0f\u8655\u7406\u4e2d\uff0c\u53ef\u4ee5\u4f7f\u7528\u5206\u5e03\u5f0f\u8a08\u7b97\u6846\u67b6\uff0c\u5982Hadoop\u3001Spark\u7b49\uff0c\u4f86\u5c07\u554f\u984c\u5206\u6210\u82e5\u5e72\u90e8\u5206\u9032\u884c\u8655\u7406\u3002\n\n\u5169\u7a2e\u65b9\u6cd5\u90fd\u6709\u5176\u512a\u9ede\u548c\u7f3a\u9ede\u3002\u5206\u5c64\u8655\u7406\u53ef\u4ee5\u66f4\u597d\u5730\u5229\u7528\u8a08\u7b97\u8cc7\u6e90\uff0c\u6e1b\u5c11\u901a\u4fe1\u6210\u672c\uff0c\u4e26\u4e14\u53ef\u4ee5\u66f4\u597d\u5730\u63a7\u5236\u554f\u984c\u7684\u8655\u7406\u904e\u7a0b\u3002\u5206\u6563\u5f0f\u8655\u7406\u53ef\u4ee5\u66f4\u597d\u5730\u5229\u7528\u8a08\u7b97\u8cc7\u6e90\uff0c\u63d0\u9ad8\u554f\u984c\u7684\u8655\u7406\u6548\u7387\uff0c\u4e26\u4e14\u53ef\u4ee5\u66f4\u597d\u5730\u61c9\u5c0d\u5927\u898f\u6a21\u554f\u984c\u3002\n\n\u7d9c\u5408\u4f86\u8aaa\uff0c\u5c0d\u65bc\u67d0\u4e9b\u554f\u984c\uff0c\u5206\u5c64\u8655\u7406\u53ef\u80fd\u66f4\u9069\u5408\uff0c\u800c\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd", "evaluation": "f1"}
{"query": "What is the previous work's model?", "gt": "Ternary Trans-CNN", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 2 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is Hinglish?\nHinglish is a linguistic blend of Hindi and English, widely spoken by over 350 million people in India. It is characterized by the mixing of English words with various Indian languages such as Punjabi, Gujarati, Marathi, and Hindi. The language is used extensively in social media platforms like Twitter, Facebook, online blogs, and reviews. However, its rampant usage has also led to its usage in delivering hate and abuses in similar platforms. The challenges posed by this problem include geographical variation, language and phonetics variation, no grammar rules, and spelling variation.\n2. What are the challenges posed by Hinglish from a modeling perspective?\nFrom a modeling perspective, Hinglish poses several challenges. Firstly, it follows largely fuzzy set of rules which evolves and is dependent upon the users preference. It doesn't have any formal definitions and thus the rules of usage are ambiguous. Secondly, the content may be highly influenced by the underlying region based on the geography of origination. Thirdly, the use of Hindi and English in a code-switched setting is highly influenced by the other languages spoken in India, which are 122 major languages and 1599 other languages according to the census in 2001. Fourthly, Hinglish has no fixed set of grammar rules, and the rules are inspired from both Hindi and English and when mixed with slur and slang produce large variation. Lastly, there is no agreement on the spellings of the words which are mixed with English.\n3. What is the dataset used in this study?\nThe dataset used in this study is the HEOT dataset, which was obtained from a past study done by Mathur et al. The dataset consists of cleaned tweets obtained from Twitter for the conversations happening in the Indian subcontinent. A labeled dataset for corresponding English tweets was also obtained from a study conducted by Davidson et al. This dataset was important to employ Transfer Learning to the task since the number of labeled dataset was very small.\n4. What are the challenges faced while working with the dataset?\nThe dataset faced several challenges while working with it. Firstly, the tweets had urls, punctuations, username mentions, hastags, em", "evaluation": "f1"}
{"query": "What dataset is used?", "gt": "HEOT , A labelled dataset for a corresponding english tweets", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 10 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "3189 rows of text messages of average length of 116 words and with a range of 1 to 1295.", "evaluation": "f1"}
{"query": "How big is the dataset?", "gt": "3189 rows of text messages", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 5 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "3189 rows, 116 words average length.", "evaluation": "f1"}
{"query": "How big is the dataset?", "gt": "Resulting dataset was 7934 messages for train and 700 messages for test.", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 12 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "3189 rows of text messages with an average length of 116 words.", "evaluation": "f1"}
{"query": "How is the dataset collected?", "gt": "A labelled dataset for a corresponding english tweets were also obtained from a study conducted by Davidson et al, HEOT obtained from one of the past studies done by Mathur et al", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 32 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. The dataset was collected from past studies done by Mathur et al. and Davidson et al.\n2. It consists of cleaned tweets obtained from Twitter for conversations happening in the Indian subcontinent.\n3. The dataset was important for employing Transfer Learning to the task as the number of labeled datasets was very small.", "evaluation": "f1"}
{"query": "Was each text augmentation technique experimented individually?", "gt": "No", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Yes", "evaluation": "f1"}
{"query": "What models do previous work use?", "gt": "Ternary Trans-CNN , Hybrid multi-channel CNN and LSTM", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 8 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Ternary Trans-CNN model\n2. Hybrid multi-channel CNN and LSTM model\n3. Fully Connected dense networks\n4. Convolution based architectures\n5. Sequence models (SimpleRNN, LSTM, GRU, Bidirectional LSTM)", "evaluation": "f1"}
{"query": "Does the dataset contain content from various social media platforms?", "gt": "No", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Yes", "evaluation": "f1"}
{"query": "What additional techniques are incorporated?", "gt": "incorporating coding syntax tree model", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 5 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Neural approach proposed\n2. SMT techniques used\n3. Data preparation and vocabulary generation\n4. LSTM model training\n5. Attention mechanism applied", "evaluation": "f1"}
{"query": "What dataset do they use?", "gt": "A parallel corpus where the source is an English expression of code and the target is Python code.", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 18 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. The authors used a text-code parallel corpus for training the SMT model.\n2. The parallel corpus contained 18,805 aligned data pairs.\n3. The source data consisted of code expressions in English, while the target data was in Python.", "evaluation": "f1"}
{"query": "Do they compare to other models?", "gt": "No", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "74.40%", "evaluation": "f1"}
{"query": "What is the architecture of the system?", "gt": "seq2seq translation", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 2 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is the main goal of the proposed methodology in the paper?\n2. What are the three main factors that make it difficult to create a tool for automatic code generation through natural language?\n3. What is the problem description in the paper?\n4. What is the proposed methodology in the paper?\n5. What is the result analysis of the training model in the paper?\n6. What is the conclusion of the paper?\n7. What is the architecture of the system?", "evaluation": "f1"}
{"query": "What additional techniques could be incorporated to further improve accuracy?", "gt": "phrase-based word embedding, Abstract Syntax Tree(AST)", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 6 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Incorporating Abstract Syntax Trees (ASTs)\n2. Expanding training data with more examples\n3. Implementing advanced NLP techniques\n4. Integrating domain-specific knowledge\n5. Enhancing attention mechanisms\n6. Utilizing transfer learning from related tasks", "evaluation": "f1"}
{"query": "What programming language is target language?", "gt": "Python", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Python", "evaluation": "f1"}
{"query": "What dataset is used to measure accuracy?", "gt": "validation data", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 2 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is the main goal of the proposed methodology in this paper?\n2. What are the three main factors that make it difficult to create a tool for automatic code generation through natural language?\n3. What is the problem description in this paper?\n4. What is the proposed methodology for translating pseudo-code or algorithm-like human language expressions into programming language code?\n5. What is the main advantage of translating to a programming language?\n6. What is the contribution of this research?\n7. What is the name of the survey that was published in 2017 that presented the state-of-the-art of the research areas where machine learning is changing the way programmers code during software engineering and development process?\n8. What is the name of the paper that discusses the restricting factors of developing such text-to-code conversion method and what problems need to be solved?\n9. What is the name of the paper that proposed a way to translate each line of Python code into natural language pseudocode using Statistical Machine Learning Technique (SMT) framework?\n10. What is the name of the paper that proposed a semantic parser which generates code through its neural model?", "evaluation": "f1"}
{"query": "What corpus was the source of the OpenIE extractions?", "gt": "domain-targeted $~$ 80K sentences and 280 GB of plain text extracted from web pages used by BIBREF6 aristo2016:combining", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 18 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is the goal of the research described in the document?\nThe goal of the research described in the document is to develop a QA system that can perform reasoning with Open IE tuples for complex multiple-choice questions that require tuples from multiple sentences. Such a system can answer complex questions in resource-poor domains where curated knowledge is unavailable.\n2. What is the challenge faced by QA systems in answering complex questions?\nThe challenge faced by QA systems in answering complex questions is the need for multi-fact reasoning and the complexity of the question structure.\n3. What is the name of the system presented in the document?\nThe name of the system presented in the document is TupleInf.\n4. What is the main difference between TupleInf and TableILP?\nThe main difference between TupleInf and TableILP is that TupleInf is based on ILP-based model of inference with tuples, while TableILP treats QA as a search for an optimal subgraph that connects terms in the question and answer via rows in a set of curated tables.\n5. What is the benefit of TupleInf over TableILP?\nThe benefit of TupleInf over TableILP is that TupleInf significantly outperforms TableILP by 11.8% on a broad set of over 1,300 science questions, without requiring manually curated tables, using a substantially simpler ILP formulation, and generalizing well to higher grade levels.", "evaluation": "f1"}
{"query": "What is the accuracy of the proposed technique?", "gt": "51.7 and 51.6 on 4th and 8th grade question sets with no curated knowledge. 47.5 and 48.0 on 4th and 8th grade question sets when both solvers are given the same knowledge", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 32 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is the main goal of the research described in the document?\nThe main goal of the research described in the document is to develop a QA system that can perform reasoning with Open IE tuples for complex multiple-choice questions that require tuples from multiple sentences. Such a system can answer complex questions in resource-poor domains where curated knowledge is unavailable.\n2. What is the challenge faced by QA systems in answering complex questions?\nThe challenge faced by QA systems in answering complex questions is their complex structure and the need for multi-fact reasoning.\n3. What is the name of the system presented in the document?\nThe name of the system presented in the document is TupleInf.\n4. How does TupleInf outperform TableILP?\nTupleInf outperforms TableILP by 11.8% on a broad set of over 1,300 science questions, without requiring manually curated tables, using a substantially simpler ILP formulation, and generalizing well to higher grade levels.\n5. What is the significance of the research described in the document?\nThe research described in the document demonstrates for the first time how Open IE based QA can be extended from simple lookup questions to an effective system for complex questions. It also presents a new ILP-based model of inference with tuples, implemented in a reasoner called TupleInf, which significantly outperforms TableILP.", "evaluation": "f1"}
{"query": "Is an entity linking process used?", "gt": "No", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Yes", "evaluation": "f1"}
{"query": "Are the OpenIE extractions all triples?", "gt": "No", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Yes.", "evaluation": "f1"}
{"query": "What method was used to generate the OpenIE extractions?", "gt": "for each multiple-choice question $(q,A) \\in Q_\\mathit {tr}$ and each choice $a \\in A$ , we use all non-stopword tokens in $q$ and $a$ as an ElasticSearch query against S, take the top 200 hits, run Open IE v4, and aggregate the resulting tuples over all $a \\in A$ and over all questions in $Q_\\mathit {tr}$", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 56 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "\n\nThe Open IE extractions were generated using Open IE v4, which is a state-of-the-art information extraction system. The text corpora used to build the tuple knowledge base (KB) were obtained from aristo2016:combining, and the training questions were used to retrieve domain-relevant sentences from this corpus. The Open IE system was then applied to these sentences to generate the tuples, which were aggregated to create the tuple KB (T). The system also used an inverted index to find the most relevant tuples from T and the text corpus (S) for a given multiple-choice question. On-the-fly tuples were also extracted from S to handle questions from new domains not covered by the training set.", "evaluation": "f1"}
{"query": "Can the method answer multi-hop questions?", "gt": "Yes", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Yes", "evaluation": "f1"}
{"query": "What was the textual source to which OpenIE was applied?", "gt": "domain-targeted $~$ 80K sentences and 280 GB of plain text extracted from web pages used by BIBREF6 aristo2016:combining", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 18 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. The text source used for Open IE application is not explicitly mentioned in the document.", "evaluation": "f1"}
{"query": "What OpenIE method was used to generate the extractions?", "gt": "for each multiple-choice question $(q,A) \\in Q_\\mathit {tr}$ and each choice $a \\in A$ , we use all non-stopword tokens in $q$ and $a$ as an ElasticSearch query against S, take the top 200 hits, run Open IE v4, and aggregate the resulting tuples over all $a \\in A$ and over all questions in $Q_\\mathit {tr}$", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 56 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "\n\nThe Open Information Extraction (Open IE) method was used to generate the extractions in the document. Open IE is a technique that extracts structured information from unstructured text by identifying entities, their roles, and the relationships between them. In this work, Open IE was used to create a tuple knowledge base (KB) by extracting tuples (subject, predicate, object) from a text corpus. These tuples were then used to answer complex multiple-choice questions that require reasoning with multiple sentences. The authors of the document present a new inference model called TupleInf, which uses Integer Linear Programming (ILP) to search for an optimal subgraph connecting terms in the question and answer via rows in the tuple KB. The model outperforms the TableILP system, which uses curated tables and a more complex ILP formulation, demonstrating the effectiveness of Open IE-based QA for complex questions.", "evaluation": "f1"}
{"query": "Is their method capable of multi-hop reasoning?", "gt": "Yes", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Yes", "evaluation": "f1"}
{"query": "What models do they propose?", "gt": "Feature Concatenation Model (FCM), Spatial Concatenation Model (SCM), Textual Kernels Model (TKM)", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 12 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Propose novel hate speech detection in multimodal publications.\n2. Collect, annotate, and publish large-scale dataset (MMHS150K).\n3. Evaluate state-of-the-art multimodal models on specific task.\n4. Study challenges of proposed task and open field for future research.\n5. Create MMHS150K dataset with 150,000 tweets containing text and images.\n6. Use tweet gathering, textual image filtering, and annotation methods.\n7. Annotate tweets using Amazon Mechanical Turk and majority voting.\n8. Perform hate speech detection using unimodal treatments (images, text).\n9. Develop multimodal architectures: Feature Concatenation Model (FCM), Spatial Concatenation Model (SCM), Textual Kernels Model (TKM).\n10. Train models with Cross-Entropy loss, ADAM optimizer, and weighted loss.\n11. Analyze results and discuss limitations of multimodal models.\n12. Conclude importance of pushing forward research in multimodal hate speech detection.", "evaluation": "f1"}
{"query": "Are all tweets in English?", "gt": "Unanswerable", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 1 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Yes", "evaluation": "f1"}
{"query": "How large is the dataset?", "gt": " $150,000$ tweets", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 2 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "150,000 tweets.", "evaluation": "f1"}
{"query": "What is the results of multimodal compared to unimodal models?", "gt": "Unimodal LSTM vs Best Multimodal (FCM)\n- F score: 0.703 vs 0.704\n- AUC: 0.732 vs 0.734 \n- Mean Accuracy: 68.3 vs 68.4 ", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 23 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. The paper proposes a novel task of hate speech detection in multimodal publications and collects, annotates, and publishes a large-scale dataset called MMHS150K.\n2. The authors evaluate state-of-the-art multimodal models on this specific task and compare their performance with unimodal detection.\n3. They study the challenges of the proposed task and open the field for future research.\n4. The paper summarizes existing hate speech datasets and related work on visual and textual data fusion.\n5. The MMHS150K dataset is created by gathering tweets containing images and text, filtering out those with only textual information, and annotating the remaining tweets.\n6. The dataset is divided into training, validation, and test sets.\n7. The authors use a CNN and an LSTM for unimodal treatment of images and tweet text, respectively.\n8. They also experiment with multimodal architectures, including feature concatenation, spatial concatenation, and textual kernels models.\n9. The results show that multimodal models do not outperform unimodal models in detecting hate speech.\n10. The authors suggest that the complexity and diversity of multimodal relations and the small set of multimodal examples may be reasons for this performance gap.", "evaluation": "f1"}
{"query": "What is author's opinion on why current multimodal models cannot outperform models analyzing only text?", "gt": "Noisy data, Complexity and diversity of multimodal relations, Small set of multimodal examples", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 13 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. The authors propose a new task of hate speech detection in multimodal publications.\n2. They collect, annotate, and publish a large-scale dataset called MMHS150K.\n3. The dataset includes 150,000 tweets, each containing text and an image.\n4. The authors evaluate state-of-the-art multimodal models on this specific task.\n5. They find that although images are useful for hate speech detection, multimodal models do not outperform unimodal textual models.\n6. The authors study the challenges of the proposed task and open the field for future research.\n7. The paper provides a comprehensive survey of related work in hate speech detection and visual and textual data fusion.\n8. The authors also discuss the limitations of existing datasets and annotation protocols.\n9. They propose a novel approach to hate speech detection that combines visual and textual information.\n10. The proposed approach outperforms existing methods on the MMHS150K dataset.\n11. The authors also provide an analysis of the performance of different models on the dataset.\n12. They conclude that the proposed approach is effective in detecting hate speech in multimodal publications.\n13. The paper provides a valuable contribution to the field of hate speech detection and opens up new avenues for future research.", "evaluation": "f1"}
{"query": "What metrics are used to benchmark the results?", "gt": "F-score, Area Under the ROC Curve (AUC), mean accuracy (ACC), Precision vs Recall plot, ROC curve (which plots the True Positive Rate vs the False Positive Rate)", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 27 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. F-score, Area Under the ROC Curve (AUC), and mean accuracy (ACC) are used to benchmark the results.\n2. These metrics evaluate the performance of the models in detecting hate speech and balancing the classification between hate and non-hate speech.", "evaluation": "f1"}
{"query": "How is data collected, manual collection or Twitter api?", "gt": "Twitter API", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 2 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Twitter API", "evaluation": "f1"}
{"query": "How many tweats does MMHS150k contains, 150000?", "gt": "$150,000$ tweets", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 2 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "150,000 tweets.", "evaluation": "f1"}
{"query": "What unimodal detection models were used?", "gt": " single layer LSTM with a 150-dimensional hidden state for hate / not hate classification", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 14 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. What is the goal of this work?\nThe goal of this work is to distinguish between publications using offensive terms and publications attacking communities, which is referred to as hate speech.\n2. What is the MMHS150K dataset?\nThe MMHS150K dataset is a manually annotated multimodal hate speech dataset that contains 150,000 tweets, each one of them containing text and an image. It was created to provide a benchmark for hate speech detection in multimodal publications.\n3. What are the challenges of detecting hate speech in multimodal publications?\nThe challenges of detecting hate speech in multimodal publications include the complexity of the task, the subjectivity of the annotations, and the difficulty of learning the complex multimodal relations between visual and textual elements.\n4. What is the contribution of this work?\nThe contributions of this work include proposing the novel task of hate speech detection in multimodal publications, collecting, annotating, and publishing a large-scale dataset, and evaluating state-of-the-art multimodal models on this specific task.\n5. What related work has been done in the field of hate speech detection?\nRelated work in the field of hate speech detection includes surveys of the literature on detecting hate speech on online textual publications, comparisons of different classification methods for detecting hate speech in Reddit and other forums, and studies on hate speech detection on Twitter.", "evaluation": "f1"}
{"query": "What different models for multimodal detection were proposed?", "gt": "Feature Concatenation Model (FCM), Spatial Concatenation Model (SCM), Textual Kernels Model (TKM)", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 12 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. Feature Concatenation Model (FCM)\n2. Spatial Concatenation Model (SCM)\n3. Textual Kernels Model (TKM)", "evaluation": "f1"}
{"query": "What annotations are available in the dataset - tweat used hate speach or not?", "gt": "No attacks to any community,  racist, sexist, homophobic, religion based attacks, attacks to other communities", "prompt": "A chat between a curious user and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Now you are given a very long document. Please follow the instruction after this document. These instructions may include summarizing a document, answering questions based on the document, or writing a required paragraph. Document is as follows. {document} \nInstruction: {inst} \nAnswer this question with 15 words.  \nASSISTANT: ", "vicuna-13b-16k-flash_pred": "1. The dataset contains 150,000 tweets with text and image, annotated as hate or non-hate speech.\n2. The annotations are based on a definition of hate speech that includes attacks on communities.\n3. The dataset is manually annotated through Amazon Mechanical Turk.\n4. Each tweet is labeled by three different workers to ensure consistency.\n5. The dataset is divided into training, validation, and test sets.", "evaluation": "f1"}
